[CWB] bugs

Serge Sharoff S.Sharoff at leeds.ac.uk
Thu May 18 10:43:40 CEST 2006


Hi Lars,

WRT your first problem. I'm also using CWB on parallel corpora and haven't found the effect like this: is it persistent for all corpora and queries? 

On the other hand, I have another parallel corpora CQP bug to add. If the number of hits is large and aligned regions are shown, CQP shows a couple of screens and then complains with a long list of messages, like:
        Error Message: Cannot allocate memory
attributes:load_component(): Warning:
  Data of CIS component of attribute word can't be loaded
mmapfile()<storage.c>: Can't mmap() file /corpora/c1/EUROPARL/EN/word.huf ...
        You have probably run out of memory / address space!

followed by a single line of:

cqp: concordance.c:540: compose_kwic_line: Assertion `(match_start >= 0) && (match_start < text_size)' failed.

Anyone came across the behaviour like this? (in the example above europarl-en is the aligned corpus, the original query was to the German one)

Cheers,
S

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it on behalf of Lars Nygaard
Sent: Tue 16/05/2006 13:38
To: cwb at sslmit.unibo.it
Subject: [CWB] bugs
 
Hi all,

Have we decided what bugtracking system to use? I've registred to bug 
reports on sourceforge, but I guess we might want to use something 
different. The text of the bug reports are reproduced below.

regards,
lars nygaard



** "cut" applies to early **

When using CWB for parallell corpora, the "cut" keyword
does not give the correct results: It is applied to the
first corpus, and does not take into account that there
can be restrictions on the aligned regions as well,
thus returning to few hits.



** WebCqp::Query fail on long sentences **

The combination of long sentences and many positional attributes seems 
to cause WebCqp::Query to fail: the process hangs at 99 % cpu usage, but 
nothing happens.

In my particular case, it was 16 attributes (a detailed morphological 
and syntactic analysis of Norwegian) and some sentences of more than a 
100 words. If necessary, I can provide some exact numbers here.

With 15 attributes, the query works, but I suspect there will be 
problems with queries returning even longer sentences (and there are 
quite a few, since the corpus conains literary text, and some authors 
produce sentences of many hundreds of words).
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb



More information about the CWB mailing list