[CWB] CL: Out of memory. (killed)

Hardie, Andrew a.hardie at lancaster.ac.uk
Fri Mar 31 11:00:43 CEST 2017


>>(@Andrew: shouldn't we consider moving to C++, if just for the sake of exceptions?)

Oh no, god no... not C++... anything but that(*)...

Andrew.

(*) nb. not to be read as an invitation to propose INTERCAL.

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Stefan Evert
Sent: 31 March 2017 09:50
To: CWBdev Mailing List
Subject: Re: [CWB] CL: Out of memory. (killed)


> On 30 Mar 2017, at 20:04, Scott Sadowsky <ssadowsky at gmail.com> wrote:
> 
> CC-C> "ábaco" 
> CL: Out of memory. (killed)                                  
> CL: [cl_realloc(block at 0x7f7e78c99010 to -2147479552 bytes)] 

The immediate cause of this crash is that something in CQP attempts to allocate a buffer of more than 2 GiB but uses a signed 32-bit int to calculate the size, so it wraps around to a negative number.

The ensuing discussion suggests that the culprit is Andrew's implementation of automatically growing strings, which never expected to have to deal with such huge strings.  It would probably better to fail with a CQP error if the KWIC context gets larger than 1M characters (or perhaps 100M), but I'm not sure how easy that is to fit into CQP's haphazard error handling.

(@Andrew: shouldn't we consider moving to C++, if just for the sake of exceptions?)

As Andrew pointed out, the root cause of the problem is that your corpus seems to contain a sentence of several hundred million tokens (so it formats to over 2 GiB).  This easily happens if there's a missing </s> tag somewhere in the middle and you encode with "-S s:0" (because the following sentences are nested in the one that hasn't been closed).  You probably got warnings about missing </s> tags when you encoded the corpus, didn't you?

If you can't be sure that the structural annotation in a corpus is well-formed XML, it's often better to do a flat encode with "-S s".

Best,
Stefan
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://liste.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list