[CWB] UTF-8 corpus encoding
oyvind.eide at iln.uio.no
oyvind.eide at iln.uio.no
Wed May 9 16:24:58 CEST 2012
Dear list,
I have now got to real testing, and try to encode a UTF-8 corpus. I do,
however, get the message:
Encoding error: an invalid byte or byte sequence for charset "latin1"
was encountered.
Is UTF-8 support implemented yet so I can specify somehow that my vrt
file is in UTF-8? I was not able to find this in the documentation.
--
Kind regards,
Øyvind Eide
Unit for Digital Documentation
University of Oslo
More information about the CWB
mailing list