[CWB] cwb-align-encode runs out of memory

Jörg Tiedemann Jorg.Tiedemann at lingfil.uu.se
Sat Jan 9 09:19:06 CET 2016


My fault - it was my own script that collected all the space. Sorry.

Jörg

**********************************************************************************
Jörg Tiedemann
Department of Modern Languages             http://www.helsinki.fi/~tiedeman/
University of Helsinki

On 08 Jan 2016, at 23:28, Stefan Evert <stefanML at collocations.de<mailto:stefanML at collocations.de>> wrote:


I have a large parallel corpus and I would like to add alignment information but cwb-align-encode seems to allocate a lot of memory and at some point it crashes. Is there any option to reduce memory consumption?

That is weird: cwb-align-encode shouldn't use any substantial amount of memory.  It reads corpus positions from the input file and writes them directly to the index files.  The code is so simple that there's hardly any room for memory leaks.

The situation is different if you use the cwb-align-import Perl script, which collects all alignment beads in memory (with considerable overhead from the Perl data structures).

Which version of CWB are you running and how exactly did you call cwb-align-encode?

Best,
Stefan


_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://devel.sslmit.unibo.it/mailman/listinfo/cwb

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160109/85bc7467/attachment.html>


More information about the CWB mailing list