[CWB] UTF-8 corpus encoding

mdecorde matthieu.decorde at ens-lyon.fr
Wed May 9 16:23:57 CEST 2012


Hi,

You must add the "-c utf8" option to the cwb-encode call.

Matthieu Decorde

Le mercredi 09 mai 2012 à 16:24 +0200, oyvind.eide at iln.uio.no a écrit :
> Dear list,
> 
> I have now got to real testing, and try to encode a UTF-8 corpus. I do, 
> however, get the message:
> 
> Encoding error: an invalid byte or byte sequence for charset "latin1" 
> was encountered.
> 
> Is UTF-8 support implemented yet so I can specify somehow that my vrt 
> file is in UTF-8? I was not able to find this in the documentation.
> 




More information about the CWB mailing list