[CWB] Strange issue with character encoding (?) in frequency lists
Scott Sadowsky
ssadowsky at gmail.com
Thu May 30 03:58:01 CEST 2019
On Tue, May 28, 2019 at 8:50 AM Hardie, Andrew <a.hardie at lancaster.ac.uk>
wrote:
Hi Andrew,
Incidentally - so far case sensitivity (and, when I add it, diacritic
> sensitivity) is set at the level of the corpus. Should it be set at the
> level of each different annotation (p-attribute)?
>
I would definitely say yes. Besides the general principle that more
granularity is better, this would make it possible to make lemmas
case-insensitive and words case-sensitive, if I understand things
correctly, which would provide a partial work-around to the
lemmas-getting-mashed-together problem that I'm facing and that Stefan
deals with (partially, at least, I guess) by making his German corpora case
sensitive.
Cheers,
Scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20190529/c9bdc254/attachment.html>
More information about the CWB
mailing list