[Sigwac] SECOND CfP: WAC-X The 10th Web as Corpus Workshop, 12 August 2016, Berlin

Roland Schäfer roland.schaefer at fu-berlin.de
Mon Apr 11 15:47:09 CEST 2016


=======================================
The 10th Web as Corpus Workshop (WAC-X)
SECOND Call for Papers
=======================================

Endorsed by the
Special Interest Group of the ACL on Web as Corpus (SIGWAC)

Co-located with ACL 2016
August 12, 2016, Berlin

Website: http://www.sigwac.org.uk/wiki/WAC-X
Contact email: wacx2016 at gmail.com

---------------------------------------
WORKSHOP DESCRIPTION

The World Wide Web has become increasingly popular as a source of
linguistic data, not only within the NLP communities, but also with
theoretical linguists facing problems of data sparseness or data
diversity. Accordingly, web corpora continue to gain importance, given
their size and diversity in terms of genres/text types. The field is
still new, though, and a number of issues in web corpus construction
need much additional research, both fundamental and applied. These
issues range from questions of corpus design (e.g., assessment of corpus
composition, sampling strategies and their relation to crawling
algorithms, and handling of duplicated material) to more technical
aspects (e.g., efficient implementation of individual post-processing
steps in document cleaning and linguistic annotation, or large-scale
parallelization to achieve web-scale corpus construction). Similarly,
the systematic evaluation of web corpora, for example in the form of
task-based comparisons to traditional corpora, has only recently shifted
into focus. For almost a decade, the ACL SIGWAC
(http://www.sigwac.org.uk/), and especially the highly successful Web as
Corpus (WAC) workshops have served as a platform for researchers
interested in compilation, processing and application of web-derived
corpora. Past workshops were co-located with major conferences on
computational linguistics and/or corpus linguistics (such as EACL,
NAACL, LREC, WWW, and Corpus Linguistics).

WAC-X will also feature the final workshop of the EmpiriST 2015 shared
task "Automatic Linguistic Annotation of Computer-Mediated
Communication/Social Media" (see
https://sites.google.com/site/empirist2015/ for details) and the panel
discussion "Corpora, open science, and copyright reforms" (see
https://www.sigwac.org.uk/wiki/WAC-X#paneldisc for details).


---------------------------------------
IMPORTANT DATES

8 May 2016: Workshop paper due date (23:59 GMT-12)
5 June 2016: Notification of acceptance
22 June 2016: Camera-ready papers due
12 August 2016: Workshop date


---------------------------------------
SECOND CALL FOR PAPERS

As in previous years, the 10th Web as Corpus workshop (WAC-X) invites
contributions pertaining to all aspects of web corpus creation,
including but not restricted to

* data collection (both for large web corpora and smaller custom
  web corpora)
* cleaning/handling of noise
* duplicate removal/document filtering
* linguistic post-processing (including non-standard data)
* automatic generation of meta data (including register, genre, etc.)
* corpus evaluation (quality of text and annotations, comparison
  to other corpora, etc.)

Furthermore, aspects of usability and availability of web-derived
corpora are highly relevant in the context of WAC-X

* development of corpus interfaces
* visualization techniques
* tools for statistical analysis of very large (e.g., web-derived)
  corpora
* long-term archiving
* documentation and standardization
* legal issues

Finally, reports of the use of web corpora in language technology and
linguistics are welcome, for example

* information extraction & opinion mining
* language modeling, distributional semantics
* machine translation
* linguistic studies of web-specific forms of communication
* linguistic studies of rare phenomena
* web-specific lexicography, grammaticography, and language
  documentation


---------------------------------------
SUBMISSION WEBSITE

Please submit your paper using SoftConf:
https://www.softconf.com/acl2016/WAC-X/


---------------------------------------
SUBMISSION FORMAT

All submissions must be in PDF format and should follow the ACL 2016
style guidelines. We strongly recommend the use of the ACL 2016 LaTeX
style files or Microsoft Word Style files. We reserve the right to
reject submissions that do not conform to these styles including font
and page size restrictions.

ACL 2016 style files: http://acl2016.org/files/acl2016.zip
(or go to ​http://acl2016.org/index.php?article_id=9)

Full paper submissions may consist of up to eight (8) pages of content
plus any number of pages consisting of only references. Short papers may
consist of up to four (4) pages of content plus any number of pages
consisting of only references. Full papers will be distinguished from
short papers in the proceedings.

Papers will be presented either orally or as posters at the workshop.
There will be no distinction between papers presented orally and those
presented as posters in the proceedings.

Reviewing of papers will be double-blind. Therefore, the paper must not
include the author's names and affiliations. Furthermore,
self-references that reveal the author's identity, e.g., "We previously
showed (Smith, 1991) ...", must be avoided. Instead, use citations such
as "Smith (1991) previously showed ...". Papers not conforming to these
requirements will be rejected without review.


---------------------------------------
ORGANIZING COMMITTEE

Paul Cook (University of New Brunswick)
Stefan Evert (Friedrich-Alexander Universität Erlangen-Nürnberg)
Roland Schäfer (Freie Universität Berlin)
Egon Stemle (European Academy of Bozen/Bolzano)

Contact email: wacx2016 at gmail.com

---------------------------------------
PROGRAMM COMMITTEE

Adrien Barbaresi, ÖAW (AT)
Silvia Bernardini, University of Bologna (IT)
Douglas Biber, Northern Arizona University (US)
Felix Bildhauer, Institut für Deutsche Sprache Mannheim (DE)
Katrien Depuydt, INL, Leiden (NL)
Jesse de Does, INL, Leiden (NL)
Cédrick Fairon, UC Louvain (BE)
William H. Fletcher, U.S. Naval Academy (US)
Iztok Kosem, Trojina, Institute for Applied Slovene Studies (SI)
Simon Krek, Jožef Stefan Institute (SI)
Lothar Lemnitzer, BBAW (DE)
Nikola Ljubešić, Sveučilišta u Zagrebu (HR)
Siva Reddy, University of Edinburgh (UK)
Steffen Remus, TU Darmstadt (DE)
Pavel Rychly, Masaryk University (CZ)
Kevin Scannell, Saint Louis University (US)
Serge Sharoff, University of Leeds (UK)
Klaus Schulz, LMU München (DE)
Kay-Michael Würzner, BBAW (DE)
Torsten Zesch, University of Duisburg-Essen (DE)
Pierre Zweigenbaum, LIMSI (FR)



More information about the Sigwac mailing list