[Sigwac] Resend: CLEANEVAL Web-as-Corpus exercise

Adam Kilgarriff adam at lexmasterclass.com
Tue Apr 3 18:56:51 CEST 2007


**Apologies for faulty links in last version**

CLEANEVAL is a shared task and competitive evaluation for cleaning arbitrary
web pages, with the goal of preparing web data for use as a corpus, for
linguistic and language technology research and development.  You are
invited to participate, and to encourage others to do so too.

Website: http://cleaneval.sigwac.org.uk 

Development dataset now available. 

*  Prizes! A prize of £250.00 (GBP) will be awarded for the best
      student entrant for each task (Chinese and English). 
*  Timetable: 
  *	March 2007: Development datasets released (English and Chinese) 
  *	June 2007: Exercise: Evaluation dataset released and, two weeks
                 later, participants to return cleaned pages 
  *	end June 2007: Papers describing systems to be submitted 
  *	Sept 15-16 2007: Workshop, part of WAC3, Louvain-la-Neuve, Belgium
      http://cental.fltr.ucl.ac.be/wac3/ 

*  Co-ordinators 
  *  Marco Baroni, Trento University, Italy 
  *  Tony Hartley, Leeds University, UK 
  *  Adam Kilgarriff, Lexical Computing Ltd., Leeds and Sussex Univs, UK 
  *  Serge Sharoff, Leeds University, UK 

CLEANEVAL is an activity of ACL-SIGWAC, the Association for Computational
Linguistics (ACL) Special Interest Group on Web as Corpus.

 



More information about the Sigwac mailing list