[Sigwac] Call for discussion: The SIGWAC crisis (instead of an announcement of WAC-XI)

chris brew cbrew at acm.org
Tue Jul 25 19:38:12 CEST 2017


Dear Milos et al

It makes complete sense for someone to study the problem of devising
web-based corpora that are useful for scientific investigations that go
beyond the purely technological. The Google Books collection and the
various instances of BNC and ANC are excellent examples of what is needed.
The existence of CQP and SketchEngine is a wonderful thing.

What is readily available now, but was not in the early days of SIGWAC, is
large volumes of lightly-curated text suitable for use in building word
vectors or for training language models that have good perplexity. The ACL
community loves and uses these (rightly, in my view) but they are not at
all the same thing as carefully thought out and designed corpora like the
ones I mentioned in the previous paragraph.

I'm not sure about continuing to co-locate with ACL. The proportion of
regular attendees at ACL who have deep background in any form of
linguistics continues to decline, and the proportion with an understanding
of corpus linguistics has never been high. I suspect that
the number of young attendees who have even heard of the BNC is very low
indeed. So the number of ACL people who would be drawn to WAC is probably
fairly small. Added to which, the conference has fee schedules that are not
really compatible with attendance by researchers who do not have the luxury
(or, to an extent, burden) of large-money engineering-style grants.

The overlap between SIGWAC and the ACL community was stronger when there
were many carefully curated annotated corpora being built by NLP teams.
This is not happening so much now. To the extent that I understand what the
people who did this are now doing, it seems to me that crowdsourcing has
risen, which usually implies shallower annotation. At the same time, some
of those people are doing more with transformations and re-use of existing
annotated corpora, as well as pushing towards methods that learn everything
from raw text. This doesn't mesh well with SIGWAC's mission. The synergy is
less than it was.

So I think the primary task is to identify a large enough community and
co-locate with conferences that are compatible with that community.



On Thu, Jul 20, 2017 at 9:06 AM, Miloš Jakubíček <
milos.jakubicek at sketchengine.co.uk> wrote:

> Dear Roland et al,
>
> thank you for raising the issue and making such a detailed analysis of the
> status, which I think I agree with in most if not all of the points, though
> I am not sure about the future options.
> Let me add two things:
>
> 1)
> There is one consideration to be taken into account that I would like to
> emphasize and that makes WAC different from most of the other ACL SIGs.
> Namely, that we are a community with users and our users are not our
> contributors.
> I think this became clear at eLex in 2015, where WAC had not enough
> submissions but extremely many (maybe even the most? It was over 50)
> registered and paying participants.
> So, people wanted to come and learn news from WAC -- but there was nobody
> to present (kudos here to Egon Stemle for a very decent failover solution
> at the time).
>
> 2)
> Further on -- and again in line with your analysis -- web corpora are now
> widespread. As result, and I thing that this may be quite relevant, less
> people are interested in WAC also because the low hanging fruit has gone,
> and what remained are sometimes quite tough issues.
>
> Now, because of point 1) I'm not sure whether a stronger linguistic drive
> would help resurrecting WAC.
> I think we might not need to think about steering one way or another (in
> fact, collocating with ACL events always brought enough attention, and for
> me the ACL stands for both of the worlds here, the linguistic one and the
> computer science/engineering one),
> but I find it crucial to try to setup a new agenda and see whether people
> are interested to work on it - if yes, that's the key to success, I
> believe. (Your idea about the joint paper seems to be like an excellent
> starting point here.)
>
> I also think WAC needs to maintain its "workshopability" -- it was never
> and will never be an event where you will publish results that you can
> publish in conferences or journals (where they get indexed etc.).
> This does not mean anything like lowering one's standards, but rather
> maintaining the status of a (top) forum for raising important issues,
> though there are no satisfactory solutions to them yet.
>
> If we could manage to draft this agenda in Birmingham, that would be
> wonderful, and I invite everybody to try to do some homework on that so
> that we do not start from scratch but with comparing notes.
> Maybe we can even make a public shared Google doc and see where it ends, if
> you like.
>
> Finally, I think there is a growing need to put more effort into WAC
> activities even just to remain at where we are now. The web is constantly
> changing, and at least our (=Sketch Engine's) experience is that it becomes
> harder to crawl than before.
> Concurrently, the web grows, so the same methods used five years ago yield
> more data anyway (on the same languages), so some of the technological
> innovation (e.g. web pages that, for crawling, basically require rendering
> the content in the browser) may go unnoticed despite introducing unwanted
> biases.
>
> Anyway, I'm looking forward to seeing all of you next week in Birmingham,
>
>
> Milos Jakubicek
>
> CEO, Lexical Computing
> Brno, CZ | Brighton UK
> http://www.lexicalcomputing.com
> http://www.sketchengine.co.uk
>
> On 7 July 2017 at 14:38, Vladimír Benko <vladob at juls.savba.sk> wrote:
>
> > Dear Roland,
> >
> > I fully agree both with your and Serge (i.e., several topics to discuss,
> > need to decide what next, and little time to write posts ;-).
> >
> > Looking forward to meeting you in Birmingham,
> >
> > Vlado B, 14:35
> >
> > _______________________________________________
> > Sigwac mailing list
> > Sigwac at sslmit.unibo.it
> > http://liste.sslmit.unibo.it/mailman/listinfo/sigwac
> >
> _______________________________________________
> Sigwac mailing list
> Sigwac at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/sigwac
>



-- 
Chris Brew, Computational Scientist, Digital Operatives LLC


More information about the Sigwac mailing list