[CWB] CL: Error, unrecognised CorpusCharset in
cl_string_validate_encoding
luigi.talamo at libero.it
luigi.talamo at libero.it
Wed Apr 6 13:42:57 CEST 2011
Hardie, Andrew wrote:
> The best thing to do in this case would be to recode the corpus and re-
index, either in 8859-5 (BUT: be aware full support for this charset is not yet
implemented e.g. CWB doesn't yet have any knowledge of case/accent folding for
8859-5) or - better yet - in UTF-8.
I guess utf-8 has a good mapping of cp1251:
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT
making possible a less painful re-encoding.
You just have to open the cp1251 encoded file in a text-editor which is utf-8
compliant and save the file with the new encoding system.
Let's know if you encounter some troubles.
Best regards,
Luigi
More information about the CWB
mailing list