[CWB] assertian failed / multilingual corpus

Hardie, Andrew a.hardie at lancaster.ac.uk
Mon Aug 1 01:50:24 CEST 2011


Hi Ruprecht,

 

This sounds like another case of this bug:

 

http://sourceforge.net/tracker/?func=detail&atid=722303&aid=2838656&grou
p_id=131809

 

which would appear to be triggered by an output-line-length overflow.
That's why you see it when you print both p-atts, but not when you print
either/or. (That's also, I would guess, why the bug is only triggered
for you in SGML mode: without SGML tags the lines do not get long
enough).

 

However, what you're seeing is a bit different from that reported back
in '09, so I'll add your symptoms to the bug database. If you could mail
me, off-list, the two corpus files that you mention, that would be
great, as I can then use that data to reproduce the bug - according to
the bug's comment thread, Stefan was unable to reproduce the
originally-reported crash.

 

Thanks very much,

 

best

 

Andrew.

 

 

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it]
On Behalf Of Ruprecht von Waldenfels
Sent: 27 June 2011 11:17
To: cwb at sslmit.unibo.it
Subject: [CWB] assertian failed / multilingual corpus

 

Dear everyone, 

I use CWB with a multilingual corpus (ParaSol, parasol.unibe.ch). I am
using an Ubuntu Server, CWB 3.2.7, downloaded and compiled Mon Jun  6
16:43:04 CEST 2011, files encoded as UTF-8

Sometimes, CWB breaks for an unknown reason; however, it does so only in
PrintMode sgml and only if two layers of annotation are included. Here
is the experiment: 

Setup: two corpus files, ECOROSA_IT, ECOROSA_RU; both with tags and
lemmata, aligned

ECOROSA_IT; show +ecorosa_ru; [word=".*[smtv]i"];   cat Last to
"file.txt"; (over 9000 hits)

adding tags OR  lemmas is not problem: 
ECOROSA_IT; show +ecorosa_ru; show +tag; [word=".*[smtv]i"];   cat Last
to "file.txt"; (over 9000 hits)
ECOROSA_IT; show +ecorosa_ru; show +lemma; [word=".*[smtv]i"];   cat
Last to "file.txt"; (over 9000 hits)

but adding BOTH leads to an error: 
ECOROSA_IT; show +ecorosa_ru; show +tag; show +lemma;
[word=".*[smtv]i"];   cat Last to "file.txt"; 

cqp: concordance.c:425: remember_this_position: Assertion
`position_list' failed.
Aborted

It seems to me that this type of error has been happening with other
versions of CWB before, too, so this is not necessarily linked to the
current version. However, I cannot be sure because I do not normally see
the error messages when something does not work. 

(A minimal version of the corpus with only these two corpus files is
visible here. )

All the best, 
Ruprecht






-- 
------------------------------------------------
 
 
Ruprecht von Waldenfels
Universitaet Bern
Institut fuer slavische Sprachen und Literaturen
Laenggassstrasse 49 - CH 3005 Bern 9
------------------------------------------------
Tel: +41  31 631 35 83 /  Fax: +41 31  631 39 90
Tel: +49 761 214 66 72 / Mob.: +49 163 230 34 23
------------------------------------------------ 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20110801/950f1267/attachment-0001.htm


More information about the CWB mailing list