From roland.schaefer at fu-berlin.de  Fri Apr 14 19:54:11 2017
From: roland.schaefer at fu-berlin.de (=?UTF-8?Q?Roland_Sch=c3=a4fer?=)
Date: Fri, 14 Apr 2017 19:54:11 +0200
Subject: [Sigwac] Final CfP & extension: 11th Web as Corpus Workshop
 (WAC-XI), 24 July 2017 at, Corpus Linguistics, Birmingham (UK)
Message-ID: <e6420d89-893e-fea7-413d-a096dd7da1c7@fu-berlin.de>

WAC-XI: The 11th Web as Corpus Workshop: first call for papers
co-located with Corpus Linguistics 2017, Birmingham, 24 July 2017

featuring the First CleanerEval Shared Task panel discussion
endorsed by the Special Interest Group of the ACL on Web as Corpus

Website: https://www.sigwac.org.uk/wiki/WAC-XI
Contact: wacxi2017 at gmail.com


*** EXTENDED DEADLINE: 24 April 2017 ***


== Workshop Description ==

For almost a decade, the ACL SIGWAC, and most notably the Web as Corpus
(WAC) workshops, have served as a platform for researchers interested in
the com?pilation, processing and use of web-derived corpora as well as
computer-mediated communication. Past workshops were co-located with
major conferences on corpus linguistics and/or computational linguistics
(such as ACL, EACL, Corpus Linguistics, LREC, NAACL, WWW). The eleventh
Web as Corpus workshop (WAC-XI) emphasises the linguistic aspects of web
corpus research more than the technological aspects while keeping in
mind that the two are inseparable.
The World Wide Web has become increasingly popular as a source of
linguistic evidence, not only within the computational linguistics
community, but also with theoretical linguists facing problems such as
data sparseness or the lack of variation in traditional corpora of
written language. Accordingly, web corpora continue to gain relevance,
given their size and diversity in terms of genres and text types. In
lexicography, web data have become a major and well-established resource
with dedicated research data and specialised tools such as the
SketchEngine. In other areas of linguistics, the adoption rate of web
corpora has been slower but steady. Furthermore, some completely new
areas of research dealing exclusively with web (or similar) data have
emerged, such as the con?struction and utilisation of corpora based on
short messages. Another example is the (manual or auto?matic)
classification of web texts by genre, register, or ? more generally
speaking ? text type, as well as topic area. Similarly, the areas of
corpus evaluation and corpus comparison have been advanced greatly
through the rise of web cor?pora, mostly because web cor?pora
(especially larger ones in the region of several billions of tokens) are
often created by download?ing texts from the web unselectively with
respect to their text type or content. While the composition (or
strati?fication) of such corpora cannot be determined before their
construction, it is desirable to evaluate it afterwards, at least. Also,
comparing web corpora to corpora that have been compiled in a more
traditional way is key in determining the quality of web corpora with
respect to a given research question.


== Call for Papers ==

The eleventh Web as Corpus workshop (WAC-XI) takes a (corpus) linguistic
look at the state of the art in all these areas. More specifically, in
linguistic publications presenting case studies based on web data, some
authors explicitly discuss and/or defend the validity of web corpus data
for a specific type of research question ? while others simply take web
corpora as a new or complementary source of data without discussing
fundamental questions of data quality and appropriateness of web data
for a given research question. We think it is vital to discuss such
fundamental questions, and therefore ask researchers to present and discuss:

? case studies in corpus or computational linguistics where web data
have been used

? research specifically related to the validity of web data in corpus,
computational, and theoretical linguistics,

? research on the technical aspects web corpus construction which have a
strong influence on theo?retical aspects of corpus design

For example, presentations could address questions (either as part of a
case study or in the form of primary research):

? Are there substantial differences in theoretical inferences when web
data are used instead of data from traditionally compiled corpora? If
so: Why? Are they expected?

? Do findings from traditionally compiled corpora and web corpora
converge when compared with evidence from other sources (such as
psycholinguistic experiments)? If not: Which type of data matches the
external findings better?

? Is it possible to analyse lectal variation with web corpora, given the
frequent lack of relevant meta data?

? How good is the quality of the (automatic) linguistic annotation of
web data compared to tradi?tionally compiled corpora? How does this
affect empirical linguistic research with web corpora? What could corpus
designers do to improve it?

? Are there differences with regard to the dispersion of linguistic
entities in web corpora com?pared to traditionally compiled corpora? If
so: Why? Does it matter? How can we deal with it or even profit from it?

? How do very large web corpora compare to smaller, more intentionally
stratified web corpora created for a specific task? How can it be
decided which type of corpus is better for a given research question?


== Important dates ==

16 February 2017: First call for workshop papers
13 March 2017: Second call for workshop papers
24 April 2017: Abstract due date (23:59 GMT)
12 June 2017: Notification of acceptance
24 July 2017: Workshop day


=== Submission format ===

We call for *anonymous* extended abstracts of up to 1,500 words length
(excluding references, tables, and figures). Submissions must be in PDF
format. Authors of accepted papers will receive minimal formatting
instructions for the publication of the abstracts on the WAC-XI website
in due time. There will be no proceedings volume, but a successful
workshop might lead to a special issue/edited volume on web (and
similar) data in linguistics (with a new round of peer reviewing), for
which a separate call for (full) papers would be published after the
workshop.


=== Submission website ===

Please use our EasyChair installation exclusively:
https://easychair.org/conferences/?conf=wac11


== Organizers ==

Adrien Barbaresi (BBAW Berlin/?AW Vienna)
  http://adrien.barbaresi.eu/
Felix Bildhauer (IDS Mannheim)
  http://www1.ids-mannheim.de/gra/personal/bildhauer.html
Roland Sch?fer (Freie Universit?t Berlin (DFG))
  http://rolandschaefer.net


== Programme committee ==

Masayuki Asahara, Nat. Inst. for Jap. Lang. and Ling., JP
Piotr B?nski, IDS Mannheim, DE
Silvia Bernardini, U of Bologna, IT
Niels Br?gger, University of Aarhus, DK
Sascha Diwersy, Universit? Montpellier 3, FR
Stefan Evert, FAU Erlangen, DE
Susanne Flach, Freie Universit?t Berlin, DE
C?drick Fairon, UC Louvain, BE
William H. Fletcher, U.S. Naval Academy, US
Jack Grieve, Aston University, UK
Aurelie Herbelot, University of Trento, IT
Matthias H?ning, FU Berlin, DE
Detmar Meurers, Universit?t T?bingen, DE
Milo? Jakub??ek, Masaryk University Brno, CZ
Iztok Kosem, Trojina, Institute for Applied Slovene Studies, SI
Anne Krause, Universit?t Leipzig, DE
Simon Krek, Jo?ef Stefan Institute, SI
Lothar Lemnitzer, BBAW, DE
Nikola Ljube?i?, Jo?ef Stefan Institute, Ljubljana, SI
Steffen Remus, TU Darmstadt, DE
Antonio Ruiz Tinoco, Sophia University, JP
Kevin Scannell, Saint Louis U, US
Serge Sharoff, University of Leeds, UK
Barbara Schl?cker, Universit?t Bonn, DE
Sabine Schulte im Walde, IMS Stuttgart, DE
Klaus Schulz, LMU M?nchen, DE
Egon Stemle, EURAC Bozen/Bolzano, IT
Peter Uhrig, FAU Erlangen, DE
Marieke van Erp, VU Amsterdam, NL
Wajdi Zaghouani, CMU Qatar, QA
Amir Zeldes, Georgetown University, Washington, US
Arne Zeschel, IDS Mannheim, DE


== CleanerEval: First Panel Discussion ==

As part of the workshop and consistent with its general theme, we plan
to organise a panel discussion as the first meeting of the CleanerEval
shared task on combined paragraph and document quality detec?tion for
(web) documents. The CleanerEval shared task follows the successful
CleanEval shared task organised by SIGWAC in 2006. While CleanEval
focused specifically on boilerplate re?moval (the removal of
automatically inserted and frequently repeated non-corpus material from
web pages), CleanerEval goes beyond this basic task. Participating
systems should be able to determine the linguistic quality of
para?graphs and whole documents in an automatic fashion, such that
corpus designers and/or users can decide whether to include them in
their corpus or not. In the CleanerEval setting, boilerplate paragraphs
are paragraphs with low quality, but there might be other,
non-boilerplate paragraphs with low quality as well. CleanerEval was
proposed by the organisers of WAC-XI during the final discussion of
WAC-X, where the proposal was met with great interest. The WAC-XI panel
discussion is intended to serve as a platform for the development of the
operationalisation of the notions of paragraph and document quality, the
an?notation guidelines, and the final schedule for the shared task.
There can be no doubt that corpus lin?guists should define what counts
as good corpus material and what does not. It would be misguided to
threat this ques?tion as a purely technical one. The final meeting of
the shared task is planned for to be part of WAC-XII in 2018.


From goran at informatik.uni-mannheim.de  Wed Apr 26 15:05:34 2017
From: goran at informatik.uni-mannheim.de (=?UTF-8?Q?Goran_Glava=C5=A1?=)
Date: Wed, 26 Apr 2017 15:05:34 +0200
Subject: [Sigwac] [Call for Papers] TextGraphs-11: Graph-based Methods for
 Natural Language Processing
Message-ID: <CAEsygv+_xM3r3qn=8iiXX2j=GkmS2+7YEx_Nw7b-3rz06ybrZA@mail.gmail.com>

Final CFP and *deadline extension to April 30*: TextGraphs-11: Graph-based
Methods for Natural Language Processing

Workshop at the 55th Annual Meeting of the Association for Computational
Linguistics (ACL 2017)

August 3, 2017

Vancouver, Canada

http://www.textgraphs.org/ws17

* Update: The deadline has been extended to April 30.


WORKSHOP DESCRIPTION

For the past eleven years, the workshops in the TextGraphs series have
published and promoted the synergy between the field of Graph Theory (GT)
and Natural Language Processing (NLP). The eleventh edition of the
TextGraphs workshop aims to extend the focus on issues and solutions for
large-scale graphs, such as those derived for web- scale knowledge
acquisition or social networks. We plan to encourage the description of
novel NLP problems or applications that have emerged in recent years, which
can be addressed with existing and new graph-based methods. Furthermore, we
will also encourage research on applications of graph-based methods in the
area of Semantic Web in order to link them to related NLP problems and
applications. The target audience comprises researchers working on problems
related to either Graph Theory or graph-based algorithms applied to Natural
Language Processing, social media, and the Semantic Web.


WORKSHOP TOPICS

TextGraphs invites submissions on (but not limited to) the following topics:

* Graph-based methods for providing reasoning and interpretation of deep
learning methods

* Graph-based methods for reasoning and interpreting deep processing by
neural networks,

* Explorations of the capabilities and limits when graph-based methods are
applied to neural networks,

* Investigation of which aspects of neural networks are not amenable to
graph-based methods.

* Graph-based methods for Information Retrieval, Information Extraction,
and Text Mining

* Graph-based methods for word sense disambiguation,

* Graph-based representations for ontology learning,

* Graph-based strategies for semantic relations identification,

* Encoding semantic distances in graphs,

* Graph-based techniques for text summarization, simplification and
paraphrasing,

* Graph-based techniques for document navigation and visualization,

* Re-ranking with graphs,

* Applications of label propagation algorithms, etc.

* New graph-based methods for NLP applications, and novel use of existing
graph methods for new NLP tasks

* Random walk methods in graphs,

* Spectral graph clustering,

* Semi-supervised graph-based methods,

* Methods and analyses for statistical networks,

* Small world graphs,

* Dynamic graph representations,

* Topological and pre-topological analysis of graphs,

* Graph kernels, etc.

* Graph-based methods for applications on social networks

* Rumor proliferation,

* E-reputation,

* Multiple identity detection,

* Language dynamics studies,

* Surveillance systems, etc.

* Graph-based methods for NLP and Semantic Web

* Representation learning methods for knowledge graphs (e.g., knowledge
graph embedding),

* Using graphs-based methods to populate ontologies using textual data,

* Inducing knowledge of ontologies into NLP applications using graphs,

* Merging ontologies with graph-based methods using NLP techniques.


BEST PAPER AWARD

The Program Committee will select a best paper submitted to TextGraphs-11.
The authors of the best manuscript will receive the valuable Best Paper
Award. Both long and short submissions will be taken in consideration.


IMPORTANT DATES

All submission deadlines are at 11:59 p.m. PST

Paper submission:                   April 30, 2017

Notification of acceptance:      May 19, 2017

Camera-ready submission:       May 26, 2017

Workshop date:               August 3, 2017


SUBMISSION

TextGraphs-11 solicits both long (8 pages) and short paper (4 pages)
submissions.

Please see our website for submission details http://www.textgraphs.org/ws17


PROGRAM COMMITTEE (in alphabetic order)

 * Sivaji Bandyopadhyay, Jadavpur University, Kolkata, India

 * Pushpak Bhattacharyya, IIT Bombay, India

 * Chris Biemann, University of Hamburg, Germany

 * Tanmoy Chakraborty, University of Maryland, USA

 * Asif Ekbar, Indian Institute of Technology, Patna, India

 * Marc Franco Salvador, University of Valencia, Spain

 * Ioana Hulpus, University of Mannheim, Germany

 * Roman Klinger, University of Stuttgart, Germany

 * Nikola Ljube?i?, University of Zagreb, Croatia

 * Hector Mart?nez Alonso, Inria & University Paris Diderot, France

 * Gabor Melli, VigLink, USA

 * Rada Mihalcea, University of Michigan, USA

 * Alessandro Moschitti, University of Trento, Italy

 * Animesh Mukherjee, IIT Kharagpur, India

 * Vivi Nastase, Heidelberg University, Germany

 * Roberto Navigli, ?La Sapienza? University of Rome, Italy

 * Alexander Panchenko, University of Hamburg, Germany

 * Simone Paolo Ponzetto, University of Mannheim, Germany

 * Steffen Remus, University of Hamburg, Germany

 * Stephan Roller, UT Austin, USA

 * Shourya Roy, Xerox Research, India

 * Anders S?gaard, University of Copenhagen, Denmark

 * Jan ?najder,, University of Zagreb, Croatia

 * Aline Villavicencio, F. University of Rio Grande do Sul, Brazil

 * Ivan Vuli?, University of Cambridge, United Kingdom

 * Fabio Massimo Zanzotto, ?Tor vergata? University of Rome, Italy


ORGANIZERS

* Martin Riedl, University of Hamburg riedl at informatik.uni-hamburg.de

* Swapna Somasundaran, Educational Testing Services ssomasundaran at ets.org

* Goran Glava?, University of Mannheim goran at informatik.uni-mannheim.de

* Eduard Hovy, Carnegie Mellon University hovy at cmu.edu


CONTACT

Please direct all questions and inquiries to our official e-mail address:

textgraphs at gmail.com

Connect with us on social media:

? Join us on Facebook:

https://www.facebook.com/groups/900711756665369/

? Follow us on Twitter:

https://twitter.com/textgraphs

? Join us on LinkedIn:

   https://www.linkedin.com/groups/4882867