[CWB] problems with Cqpweb and frequency lists

Stefania Spina stefania.spina at unistrapg.it
Sat Jun 20 11:21:20 CEST 2015


Hello,
I have an Italian corpus indexed in Cqpweb (v3.1.13); the corpus is encoded
in iso-8859-1.
When I use frequency lists, it seems that accented and non-accented
characters are not properly distinguished. For example, in the word
frequency list, the word "è" combines the frequency values of "è" and "e",
and the unaccented word "e" is not included in the frequency list.
This does not happen in the queries, where accented and non accented
characters are perfectly distinguished.
Is there a way I can solve this problem?
Thank you for your help,
Stefania

-- 
Stefania Spina
Università per Stranieri di Perugia
Dipartimento di Scienze Umane e Sociali
stefania.spina at unistrapg.it
https://unistrapg.academia.edu/StefaniaSpina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20150620/aaa5dc02/attachment.html>


More information about the CWB mailing list