[CWB] assertian failed / multilingual corpus

Ruprecht von Waldenfels waldenfels at issl.unibe.ch
Fri Sep 2 11:15:09 CEST 2011


Hi Andrew,
sorry for the late reply, and thanks for your work!

The corpora are available for download here, please let me know when 
you've got them!

http://parasol.unibe.ch/HardieRussianItalianCorpus.tar.gz

I've packed the corpora in two versions:
(1) as-they-are, that is, the encoded files (only the registry files 
need to be updated).
(2) in vertical format, ready for inclusion into CWB

Thanks again,
all the best!

Ruprecht



Am 01.08.2011 01:50, schrieb Hardie, Andrew:
>
> Hi Ruprecht,
>
> This sounds like another case of this bug:
>
> http://sourceforge.net/tracker/?func=detail&atid=722303&aid=2838656&group_id=131809 
> <http://sourceforge.net/tracker/?func=detail&atid=722303&aid=2838656&group_id=131809>
>
> which would appear to be triggered by an output-line-length overflow. 
> That's why you see it when you print both p-atts, but not when you 
> print either/or. (That's also, I would guess, why the bug is only 
> triggered for you in SGML mode: without SGML tags the lines do not get 
> long enough).
>
> However, what you're seeing is a bit different from that reported back 
> in '09, so I'll add your symptoms to the bug database. If you could 
> mail me, off-list, the two corpus files that you mention, that would 
> be great, as I can then use that data to reproduce the bug -- 
> according to the bug's comment thread, Stefan was unable to reproduce 
> the originally-reported crash.
>
> Thanks very much,
>
> best
>
> Andrew.
>
> *From:*cwb-bounces at sslmit.unibo.it 
> [mailto:cwb-bounces at sslmit.unibo.it] *On Behalf Of *Ruprecht von 
> Waldenfels
> *Sent:* 27 June 2011 11:17
> *To:* cwb at sslmit.unibo.it
> *Subject:* [CWB] assertian failed / multilingual corpus
>
> Dear everyone,
>
> I use CWB with a multilingual corpus (ParaSol, parasol.unibe.ch). I am 
> using an Ubuntu Server, CWB 3.2.7, downloaded and compiled Mon Jun  6 
> 16:43:04 CEST 2011, files encoded as UTF-8
>
> Sometimes, CWB breaks for an unknown reason; however, it does so only 
> in PrintMode sgml and only if two layers of annotation are included. 
> Here is the experiment:
>
> Setup: two corpus files, ECOROSA_IT, ECOROSA_RU; both with tags and 
> lemmata, aligned
>
> ECOROSA_IT; show +ecorosa_ru; [word=".*[smtv]i"];   cat Last to 
> "file.txt"; (over 9000 hits)
>
> adding tags OR  lemmas is not problem:
> ECOROSA_IT; show +ecorosa_ru; show +tag; [word=".*[smtv]i"];   cat 
> Last to "file.txt"; (over 9000 hits)
> ECOROSA_IT; show +ecorosa_ru; show +lemma; [word=".*[smtv]i"];   cat 
> Last to "file.txt"; (over 9000 hits)
>
> but adding BOTH leads to an error:
> ECOROSA_IT; show +ecorosa_ru; show +tag; show +lemma; 
> [word=".*[smtv]i"];   cat Last to "file.txt";
>
> cqp: concordance.c:425: remember_this_position: Assertion 
> `position_list' failed.
> Aborted
>
> It seems to me that this type of error has been happening with other 
> versions of CWB before, too, so this is not necessarily linked to the 
> current version. However, I cannot be sure because I do not normally 
> see the error messages when something does not work.
>
> (A minimal version of the corpus with only these two corpus files is 
> visible here. )
>
> All the best,
> Ruprecht
>
>
>
>
> -- 
> ------------------------------------------------
>   
>   
> Ruprecht von Waldenfels
> Universitaet Bern
> Institut fuer slavische Sprachen und Literaturen
> Laenggassstrasse 49 - CH 3005 Bern 9
> ------------------------------------------------
> Tel: +41  31 631 35 83 /  Fax: +41 31  631 39 90
> Tel: +49 761 214 66 72 / Mob.: +49 163 230 34 23
> ------------------------------------------------
>
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb


-- 
------------------------------------------------
Ruprecht v. Waldenfels, waldenfels at issl.unibe.ch
Institut fuer slavische Sprachen und Literaturen
Universität Bern Laenggassstr. 49 CH 3005 Bern 9
Tel: +41  31 631 35 83 /  Fax: +41 31  631 39 90
------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20110902/f7868c26/attachment.htm


More information about the CWB mailing list