[Sigwac] Re: How to retrieve and parse the results of google
Bill Fletcher
fletcher at kwicfinder.com
Thu Mar 25 15:35:10 CET 2010
I agree with Adam that both Bing (formerly LiveSearch) and Yahoo! are
easier to use and more generous than Google. I have some commented
bare-bones PHP code to query and parse results from Bing at:
http://webascorpus.org/wacwiki/doku.php?id=user_posts:bill_fletcher:livesearchbarebones
Fetching the actual pages from Bing's cache is usually faster than
getting them directly; PDFs are converted to HTML. PHP also has
functions to strip the HTML tags from the webpages (although they can
also swallow text and strand javascripts that are not commented out).
Bing API 2.0 offers lots of sample and and various alternate query
formats. There also are several query parameters you can tweak to get
larger result sets than the usual maximum of 1000 hits.
Bill
More information about the Sigwac
mailing list