[Sigwac] Re: bootcats / large crawls
Marco Baroni
baroni at sslmit.unibo.it
Mon Aug 28 12:57:23 CEST 2006
> technical approach to many tasks - I'm just drawn to *making*
> front-ends!
Well, as soon as I started teaching I discovered that front-ends are
extremely important, if you don't want to spend all your lecture time
explaining what does "ls" do to a crowd of angry students! ;-)
> You could try experimenting with Nutch: http://lucene.apache.org/nutch.
> It's a full-blown web search engine, of which the crawler is (obivously)
> one of its components. I haven't tried it personally - yet!
I looked into nutch some time ago and, at least then, their crawler was
much more primitive than heritrix... still, a very interesting project!
Regards,
Marco
More information about the Sigwac
mailing list