[CWB] CL: Error,
unrecognised CorpusCharset in cl_string_validate_encoding
George Mitrevski
mitrevski at auburn.edu
Tue Apr 5 02:18:18 CEST 2011
Hi everyone.
I am trying to access a corpus in cyrillic (cp1251) in Windows with cqp.exe.
I got the cqp.exe window to accept cyrillic characters, but now I
encountered another problem.
In the registry I change to charset to "cyrillic" and I get this error:
MKCORPUS> "кога";
CL: Regex Compile Error: unrecognized character after (? or (?-
CQP Error:
Illegal regular expression: ????
When I change the charset to "cp1251", I get this error
MKCORPUS> "кога";
CL: Error, unrecognised CorpusCharset in cl_string_validate_encoding.
CQP Error:
Query includes a character or character sequence that is invalid
in the encoding specified for this corpus
Someone else reported a similar problem with the charset here
http://liste.sslmit.unibo.it/pipermail/cwb/2007-July/000077.html and the
advice given was
All you have to do is keep the "##::" and change the charset value to
"latin2" (CQP won't understand iso-8859-2), like so:
What should I set the charset value to so that cqp can understand cyrillic
texts?
Thanks much.
--
Dr. George Mitrevski
Professor Emeritus
Auburn University
*Website*: http://www.auburn.edu/~mitrege
*Macedonian Higher Education Blog:* http://visokoobrazovanie.blogspot.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20110404/a37c8326/attachment.htm
More information about the CWB
mailing list