[CWB] Character encoding revisited

Josep M. Fontana josepm.fontana at upf.edu
Wed Jun 25 18:41:42 CEST 2014


Our corpus is encoded in UTF-8 but when I create a text file with the 
output of some search I get the typical odd characters one gets when the 
conversion has gone wrong. I used the 'file' command and I saw that the 
text files are sometimes encoded as ISO-8859 and some other times as 
ASCII. Is there anyway to configure things so that the UTF-8 character 
set is maintained? Thanks.

Josep M.

More information about the CWB mailing list