[CWB] UTF-8 corpus encoding
mdecorde
matthieu.decorde at ens-lyon.fr
Wed May 9 16:23:57 CEST 2012
Hi,
You must add the "-c utf8" option to the cwb-encode call.
Matthieu Decorde
Le mercredi 09 mai 2012 à 16:24 +0200, oyvind.eide at iln.uio.no a écrit :
> Dear list,
>
> I have now got to real testing, and try to encode a UTF-8 corpus. I do,
> however, get the message:
>
> Encoding error: an invalid byte or byte sequence for charset "latin1"
> was encountered.
>
> Is UTF-8 support implemented yet so I can specify somehow that my vrt
> file is in UTF-8? I was not able to find this in the documentation.
>
More information about the CWB
mailing list