[Sigwac] Today's bit of interesting trivia ...

Niels Ott niels at drni.de
Tue Sep 9 22:25:24 CEST 2008


Hi,

well, nobody knows what Google are doing, but as it seems the named 
n-gram is a book title. I'd suspect that in lists, the determiner "the" 
is often put to the end as seen in the n-gram (As in record stores: 
"Ramones, the").

So who knows if Google treats lists or tables in a special way, or if, 
as Serge suggests, there are some kind of duplicates involved?

Furthermore, the book was released in March 2006, so two years ago it 
was probably available from many many (online) shops and wholesale 
dealers (lists?). Perhaps today it isn't that popular any more.

Did the Web change or did Google change? Or both? We'll never know. :-)

Best,

   Niels


Serge Sharoff schrieb:
> my guess is that the n-gram frequency comes from the total count before duplicate detection, such as library lists:
> http://www.mountsihighschool.com/library/AR4.0-4.4.htm
> while the query output filters the majority of nearly identical lists.
> Still I didn't find the n-gram database terribly useful for my tasks.
> Serge
> 
> -----Original Message-----
> From: sigwac-bounces at sslmit.unibo.it on behalf of Stefan Evert
> Sent: Tue 09/09/2008 21:03
> To: SIGWAC Mailing List
> Subject: [Sigwac] Today's bit of interesting trivia ...
>  
> Here's and entry from Google's 5-gram database -- you may remember how  
> enthusiastic people were on and off the corpora mailing list about its  
> release two years ago:
> 
> 	Healing Time of Hickeys The	3915
> 
> Now, if I type that into Google today:
> 
> 	"Healing Time of Hickeys The"
> 
> I get approximately 30 hits (at least in Germany, perhaps that's the  
> name of a terribly subversive group that the Chinese government  
> doesn't like at all, so Google removed all references from its servers  
> in order to get better business opportunities over in Beijing).
> 
> Talk about reliability and stability of Web counts ...
> 
> :o)
> Stefan
> _______________________________________________
> Sigwac mailing list
> Sigwac at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/sigwac
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Sigwac mailing list
> Sigwac at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/sigwac


-- 
Niels Ott - Computational Linguist (B.A.) - http://www.drni.de/niels/
           - My PGP key is available from your favorite key server.

"As breathing is my life, To stop I dare not dare."  (John Lennon)


More information about the Sigwac mailing list