[CWB] Character encoding problems when transferring corpus

Josep M. Fontana josepm.fontana at upf.edu
Sun Aug 12 18:11:12 CEST 2012


Hi,

I'm not sure this is really a CWB problem (in fact I'm pretty sure it is 
not) but since there might be other users that have CWB running on a Mac 
perhaps I can get some help in this list.

I installed CWB on my Mac and then transferred all the relevant 
directories and registry files from the CWB installation we have running 
on a LAMP server. Everything seems to be working fine except that the 
results of a query on the terminal come out like this (the words as they 
should be displayed are within parentheses)


60      ila<B7>lustra<AD>ssim  [#41507-#41566]  (--> 'il·lustríssim')
58      fama<B3>s  [#24851-#24908] ( --> 'famós')

The corpus is encoded as UTF-8 and my terminal (iTerm) is set up 
properly to view UTF-8 encoded texts. I have no problems viewing other 
UTF-8 encoded texts on this computer and I don't have these problems 
when accessing the corpus remotely.

Why would UTF-8 encoded texts from the transferred corpus not be 
displayed properly? Is there any way to fix this? Any help would be 
greatly appreciated.

Josep M.


More information about the CWB mailing list