[Sigwac] Re: [WaCky] A little survey on boilerplate removal methods

Marco Baroni baroni at sslmit.unibo.it
Mon Sep 25 21:16:00 CEST 2006


> I crossposted this reply to the Sigwac list, and suggest to continue
> discussion there.

Yes, sorry, I did not even realize we were still wacky-ing...

> Especially from my position as a student it is a somewhat odd situation
> with all of this: Lecturers teach a lot of really sophisticated things,
> that are definitely scientific and important etc.
> But on the low level, the oh-so boring pre-processing fails for many
> real life applications. Hm. Well. Maybe I'm too much interested in
> practical stuff.

I fully agree with your stance. It's not even just a matter of "real life 
applications": even to do, say, good parsing for a very abstract 
theoretical linguistics study, you first need to do good encoding 
detection, tokenization, and all the other "boring" stuff... One of the 
goals of Cleaneval is exactly to encourage people to take such matters 
seriously...

Regards,

Marco



More information about the Sigwac mailing list