[Sigwac] Re: [WaCky] A little survey on boilerplate removal
methods
Marco Baroni
baroni at sslmit.unibo.it
Mon Sep 25 21:16:00 CEST 2006
> I crossposted this reply to the Sigwac list, and suggest to continue
> discussion there.
Yes, sorry, I did not even realize we were still wacky-ing...
> Especially from my position as a student it is a somewhat odd situation
> with all of this: Lecturers teach a lot of really sophisticated things,
> that are definitely scientific and important etc.
> But on the low level, the oh-so boring pre-processing fails for many
> real life applications. Hm. Well. Maybe I'm too much interested in
> practical stuff.
I fully agree with your stance. It's not even just a matter of "real life
applications": even to do, say, good parsing for a very abstract
theoretical linguistics study, you first need to do good encoding
detection, tokenization, and all the other "boring" stuff... One of the
goals of Cleaneval is exactly to encourage people to take such matters
seriously...
Regards,
Marco
More information about the Sigwac
mailing list