[Sigwac] Call for participation: WAC-9 @ EACL2014 and "CLEANEVAL 2" panel discussion

Roland Schäfer roland.schaefer at fu-berlin.de
Tue Mar 11 13:57:15 CET 2014


CALL FOR PARTICIPATION

9th Web as Corpus Workshop (WAC-9) @ ​EACL 2014
April 26, 2014 (Gothenburg, Sweden)

   https://sigwac.org.uk/wiki/WAC9

The workshop program is now available on the WAC-9 homepage.


INVITATION TO "CLEANEVAL 2" PANEL DISCUSSION

As part of the workshop, we will have a panel discussion dedicated to
the planning of a shared task for WAC-10 (2015), tentatively "CLEANEVAL
2". The tracks of the shared task might focus on the quality of web
corpus creation tools, tools for linguistic annotation (lemmatization,
possibly also POS tagging, etc.). If you have not done so yet, please
consider filling out this short online survey regarding a potential
shared task, even if you do not plan to attend WAC-9:

   https://www.surveymonkey.com/s/D8RFRCR


WORKSHOP DESCRIPTION

The World Wide Web has become increasingly popular as a source of
linguistic data, not only within the NLP communities, but also with
theoretical linguists facing problems of data sparseness or data
diversity. Accordingly, web corpora continue to gain importance, given
their size and diversity in terms of genres/text types. However, the
field is still new, and a number of issues in web corpus construction
still needs much research (fundamental and applied), ranging from
questions of corpus design (e.g., corpus composition assessment,
sampling strategies and their relation to crawling algorithms, handling
of duplicated material) to more technical aspects (e.g., efficient
implementation of individual post-processing steps in document cleansing
and linguistic annotation, or large-scale parallelization to achieve
web-scale corpus construction). Similarly, the systematic evaluation of
web corpora, for example in the form of task-based comparisons to
traditional corpora, has only lately shifted into focus.

For almost a decade, the ACL SIGWAC, and especially the highly
successful Web as Corpus (WAC) workshops have served as a platform for
researchers interested in building and working with web-derived corpora.
Past workshops have been co-located with major conferences on
computational linguistics and/or corpus linguistics (such as EACL, LREC,
WWW, Corpus Linguistics).


More information about the Sigwac mailing list