[Sigwac] Re: How to retrieve and parse the results of google

Bill Fletcher fletcher at kwicfinder.com
Thu Mar 25 15:35:10 CET 2010


I agree with Adam that both Bing (formerly LiveSearch) and Yahoo! are 
easier to use and more generous than Google.  I have some commented 
bare-bones PHP code to query and parse results from Bing at:
http://webascorpus.org/wacwiki/doku.php?id=user_posts:bill_fletcher:livesearchbarebones
Fetching the actual pages from Bing's cache is usually faster than 
getting them directly; PDFs are converted to HTML.  PHP also has 
functions to strip the HTML tags from the webpages (although they can 
also swallow text and strand javascripts that are not commented out).

Bing API 2.0 offers lots of sample and and various alternate query 
formats.   There also are several query parameters you can tweak to get 
larger result sets than the usual maximum of 1000 hits.

Bill






More information about the Sigwac mailing list