[CWB] unicode problems with Greek and OCS

Hardie, Andrew a.hardie at lancaster.ac.uk
Mon Mar 16 14:50:34 CET 2015


Well, this is peculiar.

I finally got some time to try reproducing the bug using Ruprecht's corpora. First, I used the build I happened to have on this machine (3.4.7) and the cl_string_canonical error was reproduced (only it occurred a lot more times); the " ERROR: fcount1" error was not reproduced.

Then I rebuilt to add in some debug messages (so I was now working on the latest version from the repo, 3.4.8) and the errors were no longer reproducible. The aligner runs obediently to completion without any protests at all.

I have absolutely no idea why this should be. The relevant files have not even been touched since they were upgraded to utf-8 compatibility two years ago.

The only thing I changed was a declaration of one pointer from char to unsigned char, but that can't be it (can it? dunno).

Ruprecht - can you run svn up (to 624) then rebuild, and see where that gets you?

best

Andrew.

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Ruprecht von Waldenfels
Sent: 11 March 2015 14:25
To: cwb at sslmit.unibo.it
Subject: Re: [CWB] unicode problems with Greek and OCS

Hi, no, this was copied from the terminal.
Am 11.03.2015 um 14:47 schrieb Stefan Evert:
>> This strongly suggests that the call to cl_string_canonical is happening during a "for (i=0;i<n;i++)" loop. But I have spent quite a lot of time last night searching and I can't find such a loop - or rather, I can, but none of those loops calls cl_string_canonical.
> Well, there are such loops – across lexical items.
>
>> But I have been staring at the two bits of code for pass 1 and pass 2 with no result. The 4 obvious calls to cl_string_canonical are in the wrong place for this to make any sense (they are in loops across lexical items), and I cannot identify any calls in the right place to functions that might then call cl+string_canonical.
> I'm also quite positive that the cl_string_canonical() calls must be happening in the loops over lexical items, and that the loops run across exactly the same items in both cases.  That's why I'm so surprised about the different numbers of errors.
>
>> Unless I am reading too much in to the timing of the error messages? (since the above c&p is, of course, a mixture of  stdout and stderr...)
> Hm, that didn't occur to me.  If this was run in a terminal, both stdout and stderr should automatically flush every line.  However, if you redirected the output to a file (perhaps with ... >msg.txt 2>&1) the messages might in fact be out of order.
>
> Hope that Andrew can reproduce the error now, that would give us a much better handle on the problem.
>
> Cheers,
> Stefan
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb

_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list