[CWB] CL: Out of memory. (killed)

Hardie, Andrew a.hardie at lancaster.ac.uk
Thu Mar 30 20:07:45 CEST 2017


Hi Scott,

Could you check what version  this is with cqp -v please?

thanks

best

Andrew

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Scott Sadowsky
Sent: 30 March 2017 19:04
To: CWBdev Mailing List
Subject: [CWB] CL: Out of memory. (killed)

When it rains, it pours, I guess!

I have a fairly large corpus (880m words) which I've been using for some time without incident (this is NOT related to the corpus I asked about yesterday, the processing of which topped out at 2^31 tokens).

Unfortunately, I've just happened upon a specific word, which when I search for it with cqp, crashes the program with the following error:

CC-C> "ábaco"
CL: Out of memory. (killed)
CL: [cl_realloc(block at 0x7f7e78c99010 to -2147479552 bytes)]

135515175:  Ahí aparecen : un retrato iluminado de l mandarín Van-ta-gin ; un junco ; un molino de arroz ; los retratos iluminados de un chino y un hoten
tote ; diversos caracteres de la escritura china ; la reproducción de una moneda en anverso y reverso ; la reproducción de los signos grabados en una cap
arazón de tortuga utilizada para la adivinación , con el nombre de " tortue mistique " ; una vista de la parte oriental de Parque de Gé-hol ; el ciclo ch
ino ; un <ábaco> ; el proceso de formación de letras ; reproducción de diversas armas de artillería ; instrumentos musicales como flautas , violines , gu
itarras , trompetas , liras , gongs , tambores , campanas ; un puente ; una aldea y sus habitantes ; la casa de un mandarín y diversas melodías en llave
de sol : Mon-lie-ouha , aires chinos y un aire musical cantado en una chalupa china .
{ ~ } $

The prompt above is the Linux terminal, rather than CQP's command line, by the way. The error comes after pegging the processor core at 100% for a good 30-45 seconds. Results for simple queries like this are normally returned in milliseconds.

Further testing has produced what are to me strange results. "árbol" works fine, but "ébola" crashes CQP, as seen below:

CC-C> "ébola"
CL: Out of memory. (killed)
CL: [cl_realloc(block at 0x7f02d14b7010 to -2147479552 bytes)]

146356674:  SIDA y el <ébola> son corresponde y es falso ,
147036486:  pertenece a l mismo grupo de l mortal virus <ébola> .
178273950:  Hay muchas enfermedades , como el caso de l hanta , de l <ébola> , de l lassa , de l dengue , etcétera , para las cuales no existen vacunas ,
y nuestro Instituto de Salud Pública podría enfrentar las suficientemente .
{ ~ } $

Other searches with word-initial non-ASCII characters have also produced crashes, such as "ácaro". But, as seen above with "árbol", at least one doesn't.

The errors are also happening with words which have non-ASCII characters in other places, such as "esdrújula".

Note that this corpus is UTF-8 encoded.

Any ideas? I've never had this problem before, and I still don't with other corpora of similar size.

Cheers,
Scott



--
Dr. Scott Sadowsky
Profesor Asistente de Lingüística
Pontificia Universidad Católica de Chile

ssadowsky gmail com
scsadowsky uc cl
http://sadowsky.cl/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20170330/f44f5ca7/attachment-0001.html>


More information about the CWB mailing list