[CWB] Character encoding problems when transferring corpus

Hardie, Andrew a.hardie at lancaster.ac.uk
Sun Aug 12 18:26:09 CEST 2012


Hi Josep,

This looks like an issue with less. It seems to be "eating" the first half of the utf8 sequence (converting it to an accentless "a") leaving the second half to appear as a bare binary character (thus the hex codes in angle brackets).

You can check this by turning off the use of a pager for query output:

set Paging no;

If queries print OK with this setting, then it is definitely an issue with less. If not, then the problem is somewhere else.

Best

Andrew.

> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it 
> [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Josep M. Fontana
> Sent: 12 August 2012 17:11
> To: Open source development of the Corpus WorkBench
> Subject: [CWB] Character encoding problems when transferring corpus
> 
> Hi,
> 
> I'm not sure this is really a CWB problem (in fact I'm pretty 
> sure it is
> not) but since there might be other users that have CWB 
> running on a Mac perhaps I can get some help in this list.
> 
> I installed CWB on my Mac and then transferred all the 
> relevant directories and registry files from the CWB 
> installation we have running on a LAMP server. Everything 
> seems to be working fine except that the results of a query 
> on the terminal come out like this (the words as they should 
> be displayed are within parentheses)
> 
> 
> 60      ila<B7>lustra<AD>ssim  [#41507-#41566]  (--> 'il·lustríssim')
> 58      fama<B3>s  [#24851-#24908] ( --> 'famós')
> 
> The corpus is encoded as UTF-8 and my terminal (iTerm) is set 
> up properly to view UTF-8 encoded texts. I have no problems 
> viewing other
> UTF-8 encoded texts on this computer and I don't have these 
> problems when accessing the corpus remotely.
> 
> Why would UTF-8 encoded texts from the transferred corpus not 
> be displayed properly? Is there any way to fix this? Any help 
> would be greatly appreciated.
> 
> Josep M.
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
> 


More information about the CWB mailing list