[CWB] bugs
Serge Sharoff
S.Sharoff at leeds.ac.uk
Thu May 18 10:43:40 CEST 2006
Hi Lars,
WRT your first problem. I'm also using CWB on parallel corpora and haven't found the effect like this: is it persistent for all corpora and queries?
On the other hand, I have another parallel corpora CQP bug to add. If the number of hits is large and aligned regions are shown, CQP shows a couple of screens and then complains with a long list of messages, like:
Error Message: Cannot allocate memory
attributes:load_component(): Warning:
Data of CIS component of attribute word can't be loaded
mmapfile()<storage.c>: Can't mmap() file /corpora/c1/EUROPARL/EN/word.huf ...
You have probably run out of memory / address space!
followed by a single line of:
cqp: concordance.c:540: compose_kwic_line: Assertion `(match_start >= 0) && (match_start < text_size)' failed.
Anyone came across the behaviour like this? (in the example above europarl-en is the aligned corpus, the original query was to the German one)
Cheers,
S
-----Original Message-----
From: cwb-bounces at sslmit.unibo.it on behalf of Lars Nygaard
Sent: Tue 16/05/2006 13:38
To: cwb at sslmit.unibo.it
Subject: [CWB] bugs
Hi all,
Have we decided what bugtracking system to use? I've registred to bug
reports on sourceforge, but I guess we might want to use something
different. The text of the bug reports are reproduced below.
regards,
lars nygaard
** "cut" applies to early **
When using CWB for parallell corpora, the "cut" keyword
does not give the correct results: It is applied to the
first corpus, and does not take into account that there
can be restrictions on the aligned regions as well,
thus returning to few hits.
** WebCqp::Query fail on long sentences **
The combination of long sentences and many positional attributes seems
to cause WebCqp::Query to fail: the process hangs at 99 % cpu usage, but
nothing happens.
In my particular case, it was 16 attributes (a detailed morphological
and syntactic analysis of Norwegian) and some sentences of more than a
100 words. If necessary, I can provide some exact numbers here.
With 15 attributes, the query works, but I suspect there will be
problems with queries returning even longer sentences (and there are
quite a few, since the corpus conains literary text, and some authors
produce sentences of many hundreds of words).
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb
More information about the CWB
mailing list