[Sigwac] Re: bootcats / large crawls

Marco Baroni baroni at sslmit.unibo.it
Mon Aug 28 12:57:23 CEST 2006


> technical approach to many tasks - I'm just drawn to *making*
> front-ends!

Well, as soon as I started teaching I  discovered that front-ends are 
extremely important, if you don't want to spend all your lecture time 
explaining what does "ls" do to a crowd of angry students! ;-)

> You could try experimenting with Nutch: http://lucene.apache.org/nutch.
> It's a full-blown web search engine, of which the crawler is (obivously) 
> one of its components. I haven't tried it personally - yet!

I looked into nutch some time ago and, at least then, their crawler was 
much more primitive than heritrix... still, a very interesting project!

Regards,

Marco



More information about the Sigwac mailing list