[CWB] Install BNC in utf8
Andrés Chandía
andres.chandia at upf.edu
Fri Aug 19 16:50:18 CEST 2022
Hi there, I'm trying to install BNC corpora into an existing CQPweb
installation, the BNCweb encoder is set to index in latin1, but I have
seen that the BNCencoder (BNC_encoder-0.9.2) is set to index in utf8
(as default).
I have tried to index with BNCweb_encoder in utf8 changing
line "$Encoder->charset("latin1");"
to "$Encoder->charset("utf8");
and
line "$Encoder->charset("utf8");"
to "$Encoder->charset(("utf8") ? "utf8" : "latin1");"
This is not working, and it is giving:
Encoding error: an invalid byte or byte sequence for charset "utf8"
was encountered.
[location of error: input line #4]
While I had no issue with BNC_encoder-0.9.2 for interactive use.
First question, is it possible to achieve the indexing in utf8? If so,
what else should I do? or what should I do instead?
Thanks. Regards.
--
Andrés Chandía
Unitat de Traducció i Ciències del Llenguatge
Roc Boronat 138, C. P.: 08018, Barcelona
Tel.: 935 055 722 - mail:andres.chandia at upf.edu
More information about the CWB
mailing list