[Sigwac] Call for discussion: The SIGWAC crisis (instead, of an announcement of WAC-XI)

Roland Schäfer roland.schaefer at fu-berlin.de
Tue Aug 1 13:02:07 CEST 2017


Hi Miloš,

On 01.08.17 12:22, Miloš Jakubíček wrote:
> Hi Roland,
>
> Sorry I do not follow - what do you mean by index here, can you please
> explain?
>

oh, sorry for being imprecise. In my understanding, a web search engine
(or "web index") does not provide the full text data, but just
provides... well, an index. Maybe I did not check thoroughly enough what
the Yacy initiative was about, but a "linguist's search engine" would
not provide the actual textual data according to my definition, but it
would just provide links to websites and maybe some aggregated data. In
other words, a search engine is not a copy of the data PLUS an index
(which is what you have in CWB or NoSkE) but just the index. This
creates a problem because the data themselves are not curated.

>> 3. More importantly, indices do not lead to reproducible results (which
>> was AFAIR one of Adam's main points in his seminal paper). Under the
>> current guidelines of the German Research Council (DFG, the main
>> third-party funding agency in DE) on textual resources, for example,
>> mentioning the planned use of results obtained from web data using an
>> index in a grant application should theoretically stand in the way of
>> approving the grant.
>>
> 
> I think I still don't get what the indices stand for here, probably not an
> index as in computer/database terms?
> (At least I don't understand why would that stand in the way of any
> funding...)

Is it better now? Maybe the problem here is more the definition of the
term "linguist's search engine", which basically triggered my doubts. I
think I have a solid understanding of how DB indices work, at least for
a mere linguist.

Best,
Roland


More information about the Sigwac mailing list