[CWB] Character encoding problems when transferring corpus
Josep M. Fontana
josepm.fontana at upf.edu
Sun Aug 12 18:11:12 CEST 2012
Hi,
I'm not sure this is really a CWB problem (in fact I'm pretty sure it is
not) but since there might be other users that have CWB running on a Mac
perhaps I can get some help in this list.
I installed CWB on my Mac and then transferred all the relevant
directories and registry files from the CWB installation we have running
on a LAMP server. Everything seems to be working fine except that the
results of a query on the terminal come out like this (the words as they
should be displayed are within parentheses)
60 ila<B7>lustra<AD>ssim [#41507-#41566] (--> 'il·lustríssim')
58 fama<B3>s [#24851-#24908] ( --> 'famós')
The corpus is encoded as UTF-8 and my terminal (iTerm) is set up
properly to view UTF-8 encoded texts. I have no problems viewing other
UTF-8 encoded texts on this computer and I don't have these problems
when accessing the corpus remotely.
Why would UTF-8 encoded texts from the transferred corpus not be
displayed properly? Is there any way to fix this? Any help would be
greatly appreciated.
Josep M.
More information about the CWB
mailing list