[CWB] unicode problems with Greek and OCS

Hardie, Andrew a.hardie at lancaster.ac.uk
Wed Mar 11 15:10:36 CET 2015


>> Well, there are such loops – across lexical items.

I meant a "for (i=0;i<n;i++)" loop where n=length of n gram, ie 0 to 2 for 3grasm, 0 to 3 for 4grams. that would give us the total of 7 if there is a single string triggering things. 

But as you say, the functions are called once per lexical item in either lexicon. So, if there is 1 bad item being canonicalised in a 3 loop then a 4 loop, that gives us 7 reported errors for pass 1; but does not explain why there are 4 reported errors in Pass 2 (which would be predicted by the presence of two bad items across the two lexicons collectively)

Andrew.

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Stefan Evert
Sent: 11 March 2015 13:47
To: CWBdev Mailing List
Subject: Re: [CWB] unicode problems with Greek and OCS


> This strongly suggests that the call to cl_string_canonical is happening during a "for (i=0;i<n;i++)" loop. But I have spent quite a lot of time last night searching and I can't find such a loop - or rather, I can, but none of those loops calls cl_string_canonical.

Well, there are such loops – across lexical items.

> But I have been staring at the two bits of code for pass 1 and pass 2 with no result. The 4 obvious calls to cl_string_canonical are in the wrong place for this to make any sense (they are in loops across lexical items), and I cannot identify any calls in the right place to functions that might then call cl+string_canonical.

I'm also quite positive that the cl_string_canonical() calls must be happening in the loops over lexical items, and that the loops run across exactly the same items in both cases.  That's why I'm so surprised about the different numbers of errors.

> Unless I am reading too much in to the timing of the error messages? (since the above c&p is, of course, a mixture of  stdout and stderr...)

Hm, that didn't occur to me.  If this was run in a terminal, both stdout and stderr should automatically flush every line.  However, if you redirected the output to a file (perhaps with ... >msg.txt 2>&1) the messages might in fact be out of order.

Hope that Andrew can reproduce the error now, that would give us a much better handle on the problem.

Cheers,
Stefan
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list