[CWB] unicode problems with Greek and OCS

Stefan Evert stefanML at collocations.de
Wed Mar 11 14:47:22 CET 2015


> This strongly suggests that the call to cl_string_canonical is happening during a "for (i=0;i<n;i++)" loop. But I have spent quite a lot of time last night searching and I can't find such a loop - or rather, I can, but none of those loops calls cl_string_canonical.

Well, there are such loops – across lexical items.

> But I have been staring at the two bits of code for pass 1 and pass 2 with no result. The 4 obvious calls to cl_string_canonical are in the wrong place for this to make any sense (they are in loops across lexical items), and I cannot identify any calls in the right place to functions that might then call cl+string_canonical.

I'm also quite positive that the cl_string_canonical() calls must be happening in the loops over lexical items, and that the loops run across exactly the same items in both cases.  That's why I'm so surprised about the different numbers of errors.

> Unless I am reading too much in to the timing of the error messages? (since the above c&p is, of course, a mixture of  stdout and stderr...)

Hm, that didn't occur to me.  If this was run in a terminal, both stdout and stderr should automatically flush every line.  However, if you redirected the output to a file (perhaps with ... >msg.txt 2>&1) the messages might in fact be out of order.

Hope that Andrew can reproduce the error now, that would give us a much better handle on the problem.

Cheers,
Stefan


More information about the CWB mailing list