[CWB] cwb-align-encode runs out of memory

Stefan Evert stefanML at collocations.de
Fri Jan 8 22:28:38 CET 2016


> I have a large parallel corpus and I would like to add alignment information but cwb-align-encode seems to allocate a lot of memory and at some point it crashes. Is there any option to reduce memory consumption? 

That is weird: cwb-align-encode shouldn't use any substantial amount of memory.  It reads corpus positions from the input file and writes them directly to the index files.  The code is so simple that there's hardly any room for memory leaks.

The situation is different if you use the cwb-align-import Perl script, which collects all alignment beads in memory (with considerable overhead from the Perl data structures).

Which version of CWB are you running and how exactly did you call cwb-align-encode?

Best,
Stefan




More information about the CWB mailing list