[CWB] cqpserver charset: Where can I set this variable?
Stefan Evert
stefanML at collocations.de
Thu Apr 3 08:47:05 CEST 2014
On 2 Apr 2014, at 14:52, Jörg Knappen <j.knappen at mx.uni-saarland.de> wrote:
> # corpus properties provide additional information about the corpus:
> ##:: charset = "latin2" # character encoding of corpus data
> ##:: language = "pl" # insert ISO code for language (de, en, fr, ...)
>
> However, the cqpserver still claims (verified using -d ALL) that the corpus
> in encoded in "latin1". It should announce "latin2" here ...
Yes, the backend function is just a dummy.
> Where does the cqpserver take the character set from, and how can I modify this?
This has been fixed in the current beta version (3.4 series). We didn't backport the change, since CWB 3.0 doesn't properly support charsets other then latin1 anyway.
You should consider upgrading to CWB 3.4; you'll probably have to compile it from the SVN repository in order to get a working CQI_CORPUS_CHARSET, since the patch was only added this February.
Best,
Stefan
More information about the CWB
mailing list