[Sigwac] Call for discussion: The SIGWAC crisis (instead, of an announcement of WAC-XI)

Adrien Barbaresi adrien.barbaresi at oeaw.ac.at
Tue Aug 1 13:27:50 CEST 2017


Dear Milos, dear Roland,

I only tackled the question of community building and organization,
because I feel that it is tied to the research questions, as I am afraid
that a shift from fundamental research to technosciences makes us all
talk in terms of "research infrastructures", "task forces", and
"curation of resources".

Additionally, the lack of visibility and coordination is one of the
reasons why the understanding of web corpus is completely different on
statmt.org, as Serge said.

We need a coordinated effort to raise concerns on quality and
suitability of hi-fi web corpora for a number of different
(computational) linguists.

We could open a chat channel and organize hackathons (I would be up for
it), but in the current state of affairs we are more likely to raise
awareness by asking existing institutions and infrastructures for
support – as it goes, they need "key performance indicators" for their
"reporting" and may very well be inclined to do so.
Since Darja Fišer is involved in talking to various communities of
linguists (including computer-mediated communication researchers), it
could be interesting to hear what she has to say.

By all means, this is not exclusive. To sum up, we need to know:
· who we are (who is currently active within WAC)
· for what we stand and where we are going (the research questions you
mentioned Roland)
· but also where we are with respect to other web corpus builders (even
curators or archivists).

All the best,
Adrien

@Milos: yes, federated content search is what I was hinting at. But it
can also start with a (SIG)WAC page featuring a clean overview of what's
available, for which language, where, and under which license/conditions.


Am 01.08.2017 um 12:30 schrieb Miloš Jakubíček:
> Hi,
> 
> On 1 August 2017 at 12:21, Roland Schäfer <roland.schaefer at fu-berlin.de>
> wrote:
> 
>> Dear Adrien,
>>
>> thanks a lot for joining the discussion.
>>
>> On 01.08.17 11:54, Adrien Barbaresi wrote:
>>>
>>> If I understand correctly, the CLARIN or YaCy initiatives share a common
>>> ground, that is resource pooling. We could confer on how to make part of
>>> our corpora available under a "meta" multilingual search engine. A
>>> research consortium such as CLARIN can help at the institutional level,
>>> and distributed search engines like YaCy are a practical solution for
>>> low-resource cooperation.
>>
>> That is surely true, and it is a valid option. My two main objections are:
>>
>> 1. This is not a research question, but a question of generating more
>> users or giving users a consistent interface for many resources (= CLARIN).
>>
>> 2. I think it will be difficult to achieve this, given that the major
>> European web corpus projects are – besides the one you are involved in –
>> SketchEngine and COW, and Aranea. I don't know about Aranea, but since
>> SketchEngine is a fully self-contained high-quality paid service, would
>> they agree to join such an effort? And given the unsolved intellectual
>> property situation in the EU, esp. Germany, COW simply cannot do that
>> (except for COCO, if some CLARIN repo takes the full risk of recovering
>> the corpora from CommonCrawl and the COCCOA stand-off files, which I
>> doubt they will).
>>
> 
> Not sure whether this is what Adrien meant. Let's not mix research &
> business & legal issues -- those three are quite separate topics.
> As for research we are all in as always. CLARIN is already at the moment
> building a Federated Content Search and by the way (No)Sketch Engine is one
> of the very few available end-points to that - so that people can plug in
> their NoSketch Engine instances, or use the main one if they have account
> (trial or paid).
> 
> 
>>
>> I admit, though, that none of the contributors to this discussion has
>> expressed much enthusiasm towards my suggestions to tackle more
>> fundamental conceptual issues
>>
>>
> wait, wait - I did ;) I said in my earlier e-mails, let's setup an agenda
> first.
> But - I'm not sure whether the word meaning of "conceptual" in your private
> ontology a my private ontology overlap ;), so, what in your opinion, would
> be these most fundamental conceptual issues?
> 
> All the best
> Milos
> _______________________________________________
> Sigwac mailing list
> Sigwac at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/sigwac
> 


More information about the Sigwac mailing list