[CWB] UTF-8 corpus encoding

oyvind.eide at iln.uio.no oyvind.eide at iln.uio.no
Wed May 9 16:24:58 CEST 2012


Dear list,

I have now got to real testing, and try to encode a UTF-8 corpus. I do, 
however, get the message:

Encoding error: an invalid byte or byte sequence for charset "latin1" 
was encountered.

Is UTF-8 support implemented yet so I can specify somehow that my vrt 
file is in UTF-8? I was not able to find this in the documentation.

-- 
Kind regards,
Øyvind Eide
Unit for Digital Documentation
University of Oslo


More information about the CWB mailing list