[Sigwac] Call for discussion: The SIGWAC crisis (instead, of an announcement of WAC-XI)

Miloš Jakubíček milos.jakubicek at sketchengine.co.uk
Wed Aug 2 09:53:15 CEST 2017


Hi Eros,

On 1 August 2017 at 13:34, Eros Zanchetta <eros at sslmit.unibo.it> wrote:

>
> this might
>> be a builtin issue: if a big corporation starts indexing their data with
>> Yacy, will it be able to skew the results?
>>
>
> That could happen but I'm not sure that's a problem (at least as far as
> BootCaT is concerned) for 2 reasons:
>
> 1) I'm not sure that big corporations would be interested in doing that
> (Yacy AFAIK has basically no users, who would want to invest time to game
> the results?)
>

No, no: I was not talking about intentional manipulation, but unintentional
one. Imagine somebody will want to index their intranet publicly (for
whatever reason, say user reviews, it can also be a university etc.; btw
Google provides that through a paid HW device) and adds it to this (so far
rather small) index -- then it will skew it towards their domain, text
types etc.


>
> 2) BootCaT's approach of using tuples of specific terms restricts the
> results so much that most of the time you want *all* the results you can
> get, so ranking becomes somewhat less relevant
>

No, this is not true, at least not for English and other languages with big
online presence. The ranking actually does a really important job. You'd be
able to retrieve thousands of hits through Bing/Google, but the ones at the
top are actually often the best ones according to a multitude of criteria
pushed by the ranking.

Best
Milos


More information about the Sigwac mailing list