[Sigwac] Call for discussion: The SIGWAC crisis (instead, of an announcement of WAC-XI)

Silvia Bernardini silvia.bernardini at unibo.it
Tue Aug 1 13:35:30 CEST 2017


Sorry, probably my fault: what I meant was that Yacy (or similar tools) could provide a means of freeing ourselves from web search engines (currently killing BootCaT-like tools), and in so doing revive interest in methods for collecting/cleaning/annotating/exploiting those specialized corpora that cannot be provided by large consortia, but will remain the realm of single individuals working DIY on very specialized topics. In other words, not that a linguist search engine would be in itself an interesting research object.

silvia


> On 1 Aug 2017, at 13:02, Roland Schäfer <roland.schaefer at fu-berlin.de> wrote:
> 
> Hi Miloš,
> 
> On 01.08.17 12:22, Miloš Jakubíček wrote:
>> Hi Roland,
>> 
>> Sorry I do not follow - what do you mean by index here, can you please
>> explain?
>> 
> 
> oh, sorry for being imprecise. In my understanding, a web search engine
> (or "web index") does not provide the full text data, but just
> provides... well, an index. Maybe I did not check thoroughly enough what
> the Yacy initiative was about, but a "linguist's search engine" would
> not provide the actual textual data according to my definition, but it
> would just provide links to websites and maybe some aggregated data. In
> other words, a search engine is not a copy of the data PLUS an index
> (which is what you have in CWB or NoSkE) but just the index. This
> creates a problem because the data themselves are not curated.
> 
>>> 3. More importantly, indices do not lead to reproducible results (which
>>> was AFAIR one of Adam's main points in his seminal paper). Under the
>>> current guidelines of the German Research Council (DFG, the main
>>> third-party funding agency in DE) on textual resources, for example,
>>> mentioning the planned use of results obtained from web data using an
>>> index in a grant application should theoretically stand in the way of
>>> approving the grant.
>>> 
>> 
>> I think I still don't get what the indices stand for here, probably not an
>> index as in computer/database terms?
>> (At least I don't understand why would that stand in the way of any
>> funding...)
> 
> Is it better now? Maybe the problem here is more the definition of the
> term "linguist's search engine", which basically triggered my doubts. I
> think I have a solid understanding of how DB indices work, at least for
> a mere linguist.
> 
> Best,
> Roland
> _______________________________________________
> Sigwac mailing list
> Sigwac at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/sigwac



More information about the Sigwac mailing list