[CWB] invalid UTF8 string passed to cl_string_canonical...

Stefan Evert stefanML at collocations.de
Thu May 12 17:16:29 CEST 2016


> On 12 May 2016, at 17:06, Stefan Evert <stefanML at collocations.de> wrote:
> 
> c) the source corpus is UTF-8, but the target corpus has a different encoding. cwb-align expects both corpora to have the same encoding, but it doesn't actually check this and simply uses the declared encoding of the first corpus.

Oops, that was too fast again – the main program (cwb-align.c) actually carries out the check before calling the code from feature-maps.c I looked at.  So this can be ruled out.  As far as I can tell, the check has always been in place.

Best,
Stefan



More information about the CWB mailing list