[CWB] CL: Error, unrecognised CorpusCharset in cl_string_validate_encoding

George Mitrevski mitrevski at auburn.edu
Tue Apr 5 02:18:18 CEST 2011


Hi everyone.
I am trying to access a corpus in cyrillic (cp1251) in Windows with cqp.exe.
I got the cqp.exe window to accept cyrillic characters, but now I
encountered another problem.

 In the registry I change to charset to "cyrillic" and I get this error:

MKCORPUS> "кога";
CL: Regex Compile Error: unrecognized character after (? or (?-
CQP Error:
        Illegal regular expression: ????


When I change the charset to "cp1251", I get this error

MKCORPUS> "кога";
CL: Error, unrecognised CorpusCharset in cl_string_validate_encoding.
CQP Error:
        Query includes a character or character sequence that is invalid
in the encoding specified for this corpus


Someone else reported a similar problem with the charset here
http://liste.sslmit.unibo.it/pipermail/cwb/2007-July/000077.html and the
advice given was

All you have to do is keep the "##::" and change the charset value to
"latin2" (CQP won't understand iso-8859-2), like so:

What should I set the charset value to so that cqp can understand cyrillic
texts?

Thanks much.
-- 
Dr. George Mitrevski
Professor Emeritus
Auburn University
*Website*: http://www.auburn.edu/~mitrege
*Macedonian Higher Education Blog:* http://visokoobrazovanie.blogspot.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20110404/a37c8326/attachment.htm


More information about the CWB mailing list