[CWB] cqpserver charset: Where can I set this variable?

Stefan Evert stefanML at collocations.de
Thu Apr 3 08:47:05 CEST 2014


On 2 Apr 2014, at 14:52, Jörg Knappen <j.knappen at mx.uni-saarland.de> wrote:

> # corpus properties provide additional information about the corpus:
> ##:: charset  = "latin2" # character encoding of corpus data
> ##:: language = "pl"     # insert ISO code for language (de, en, fr, ...)
> 
> However, the cqpserver still claims (verified using -d ALL) that the corpus
> in encoded in "latin1". It should announce "latin2" here ...

Yes, the backend function is just a dummy. 

> Where does the cqpserver take the character set from, and how can I modify this?

This has been fixed in the current beta version (3.4 series).  We didn't backport the change, since CWB 3.0 doesn't properly support charsets other then latin1 anyway.

You should consider upgrading to CWB 3.4; you'll probably have to compile it from the SVN repository in order to get a working CQI_CORPUS_CHARSET, since the patch was only added this February.

Best,
Stefan


More information about the CWB mailing list