[Sigwac] CLEANEVAL Web-as-Corpus exercise

Eric Atwell eric at comp.leeds.ac.uk
Tue Apr 3 18:41:20 CEST 2007


Adam,

the links you cited dont work - these are links to file: on your C: drive 
on your PC, not to web URLs...

?try again?

eric

On Tue, 3 Apr 2007, Adam Kilgarriff wrote:

> CLEANEVAL is a shared task and competitive evaluation for cleaning arbitrary
> web pages, with the goal of preparing web data for use as a corpus, for
> linguistic and language technology research and development.  You are
> invited to participate, and to encourage others to do so too.
>
> Development
> <file:///C:\Documents%20and%20Settings\Adam\My%20Documents\Academic\CLEANEVA
> L\devset.html>  dataset now available.
>
> *	Prizes! A prize of £250.00 (GBP) will be awarded for the best
> student entrant for each task (Chinese and English).
> *	Fuller description
> http://cleaneval.sigwac.org.uk/cleaneval-overview.html
> <file:///C:\Documents%20and%20Settings\Adam\My%20Documents\Academic\CLEANEVA
> L\cleaneval-overview.html> .
> *	Timetable:
>
>  _____
>
> *	March 2007: Development datasets released (English and Chinese)
> *	June 2007: Exercise: Evaluation dataset released and, two weeks
> later, participants to return cleaned pages
> *	end June 2007: Papers describing systems to be submitted
> *	Sept 15-16 2007: Workshop, part of WAC3, Louvain-la-Neuve, Belgium
> http://cental.fltr.ucl.ac.be/wac3/
>
>  _____
>
> *	Annotation guidelines
> http://cleaneval.sigwac.org.uk/annotation_guidelines.html
> <file:///C:\Documents%20and%20Settings\Adam\My%20Documents\Academic\CLEANEVA
> L\annotation_guidelines.html> .
> *	Co-ordinators
>
> *	Marco Baroni <http://sslmit.unibo.it/~baroni/> , Trento University,
> Italy
> *	Tony Hartley <http://www.leeds.ac.uk/cts/staff/tony_hartley.htm> ,
> Leeds University, UK
> *	Adam Kilgarriff <http://www.kilgarriff.co.uk> , Lexical Computing
> Ltd., Leeds and Sussex Universities, UK
> *	Serge Sharoff <http://www.comp.leeds.ac.uk/ssharoff/> , Leeds
> University, UK
>
>
>
> CLEANEVAL is an activity of ACL-SIGWAC <http://sigwac.org.uk> , the
> Association for Computational Linguistics (ACL) <http://www.aclweb.org>
> Special Interest Group on Web as Corpus.
>
>
>
> _______________________________________________
> Sigwac mailing list
> Sigwac at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/sigwac
>

-- 
Eric Atwell,
Senior Lecturer, Language research group, School of Computing
Faculty of Engineering, UNIVERSITY OF LEEDS, Leeds LS2 9JT, England
TEL: 0113-3435430  FAX: 0113-3435468  WWW/email: google Eric Atwell


More information about the Sigwac mailing list