[CWB] CL: Out of memory. (killed)

Scott Sadowsky ssadowsky at gmail.com
Fri Mar 31 05:40:26 CEST 2017


On Thu, Mar 30, 2017 at 11:57 PM, Hardie, Andrew <a.hardie at lancaster.ac.uk>
wrote:

Hi Andrew,

Can you try changing the concordance width to a fixed number of characters,
> say 50, and see if the error  persists?
>

I just did set c 50 and ran the "ábaco" query, and it worked perfectly.

I then did set c 1s and the query crashed. As the doomed query was being
performed, my RAM usage went from 10.8 GB to 12.7 GB (I have 32 GB in
total, though).

Also, querying the next couple words in the sentence that appears with
"ábaco" causes the same crash.


It rather looks like the cause of the error is an attempt to allocate more
> memory than is available to the construction of a concordance string. It
> also looks like your concordance width is set to 1 sentence (s or similar
> s-attribute). A hit in a very long sentence could, thus, exhaust your
> memory. But this won’t happen in character-mode width. SO, if the bug
> persists in a 50 char width concordance., I’m wrong.
>

I think you've hit the nail on the head.

Being that using a linguistic unit like the sentence produces (to me, at
least) much more useful query results than arbitrary numbers of characters
or words, is there any way to work around this? Something like a high but
hard limit on context size (say, 1000 words, which I just tried
successfully), *in addition to* the user's word or sentence-based limit?

Cheers,
Scott




>
> *From:* cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] *On
> Behalf Of *Scott Sadowsky
> *Sent:* 31 March 2017 00:43
> *To:* Open source development of the Corpus WorkBench
> *Subject:* Re: [CWB] CL: Out of memory. (killed)
>
>
>
> Sure!
>
>
>
> The IMS Open Corpus Workbench (CWB)
>
>
>
> Copyright (C) 1993-2006 by IMS, University of Stuttgart
>
> Original developer:       Oliver Christ
>
>     with contributions by Bruno Maximilian Schulze
>
> Version 3.0 developed by: Stefan Evert
>
>     with contributions by Arne Fitschen
>
>
>
> Copyright (C) 2007-today by the CWB open-source community
>
>     individual contributors are listed in source file AUTHORS
>
>
>
> Download and contact: http://cwb.sourceforge.net/
>
>
>
> Compiled:  Sun 26 Mar 19:37:22 CLST 2017
>
> Version:   3.4.11
>
>
>
> Mind you, I downloaded and compiled the latest development version about a
> week ago, and that build number isn't shown here. If you need it and can
> tell me how to get it, I'll be glad to do so.
>
>
>
> Cheers!
>
> Scott
>
>
>
> On Thu, Mar 30, 2017 at 3:07 PM, Hardie, Andrew <a.hardie at lancaster.ac.uk>
> wrote:
>
> Hi Scott,
>
>
>
> Could you check what version  this is with *cqp -v* please?
>
>
>
> thanks
>
>
>
> best
>
>
>
> Andrew
>
>
>
> *From:* cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] *On
> Behalf Of *Scott Sadowsky
> *Sent:* 30 March 2017 19:04
> *To:* CWBdev Mailing List
> *Subject:* [CWB] CL: Out of memory. (killed)
>
>
>
> When it rains, it pours, I guess!
>
>
>
> I have a fairly large corpus (880m words) which I've been using for some
> time without incident (this is NOT related to the corpus I asked about
> yesterday, the processing of which topped out at 2^31 tokens).
>
>
>
> Unfortunately, I've just happened upon a specific word, which when I
> search for it with cqp, crashes the program with the following error:
>
>
>
> CC-C> "ábaco"
> CL: Out of memory. (killed)
> CL: [cl_realloc(block at 0x7f7e78c99010 to -2147479552 bytes)]
>
> 135515175:  Ahí aparecen : un retrato iluminado de l mandarín Van-ta-gin
> ; un junco ; un molino de arroz ; los retratos iluminados de un chino y un
> hoten
> tote ; diversos caracteres de la escritura china ; la reproducción de una
> moneda en anverso y reverso ; la reproducción de los signos grabados en una
> cap
> arazón de tortuga utilizada para la adivinación , con el nombre de "
> tortue mistique " ; una vista de la parte oriental de Parque de Gé-hol ; el
> ciclo ch
> ino ; un <ábaco> ; el proceso de formación de letras ; reproducción de
> diversas armas de artillería ; instrumentos musicales como flautas ,
> violines , gu
> itarras , trompetas , liras , gongs , tambores , campanas ; un puente ;
> una aldea y sus habitantes ; la casa de un mandarín y diversas melodías en
> llave
> de sol : Mon-lie-ouha , aires chinos y un aire musical cantado en una
> chalupa china .
> *{ ~ } $*
>
>
> The prompt above is the Linux terminal, rather than CQP's command line, by
> the way. The error comes after pegging the processor core at 100% for a
> good 30-45 seconds. Results for simple queries like this are normally
> returned in milliseconds.
>
>
>
> Further testing has produced what are to me strange results. "árbol" works
> *fine*, but "ébola" crashes CQP, as seen below:
>
>
>
> CC-C> "ébola"
> CL: Out of memory. (killed)
> CL: [cl_realloc(block at 0x7f02d14b7010 to -2147479552 bytes)]
>
> 146356674:  SIDA y el <ébola> son corresponde y es falso ,
> 147036486:  pertenece a l mismo grupo de l mortal virus <ébola> .
> 178273950:  Hay muchas enfermedades , como el caso de l hanta , de l <
> ébola> , de l lassa , de l dengue , etcétera , para las cuales no existen
> vacunas ,
> y nuestro Instituto de Salud Pública podría enfrentar las suficientemente
> .
> *{ ~ } $*
>
>
> Other searches with word-initial non-ASCII characters have also produced
> crashes, such as "ácaro". But, as seen above with "árbol", at least one
> doesn't.
>
>
>
> The errors are also happening with words which have non-ASCII characters
> in other places, such as "esdrújula".
>
>
>
> Note that this corpus is UTF-8 encoded.
>
>
>
> Any ideas? I've never had this problem before, and I still don't with
> other corpora of similar size.
>
>
>
> Cheers,
>
> Scott
>
>
>
>
>
> --
>
> Dr. Scott Sadowsky
> Profesor Asistente de Lingüística
>
> Pontificia Universidad Católica de Chile
>
>
>
> ssadowsky gmail com
>
> scsadowsky uc cl
> http://sadowsky.cl/
>
>
>
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>
>
>
>
> --
>
> Dr. Scott Sadowsky
> Profesor Asistente de Lingüística
>
> Pontificia Universidad Católica de Chile
>
>
>
> ssadowsky gmail com
>
> scsadowsky uc cl
> http://sadowsky.cl/
>
>
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>


-- 
Dr. Scott Sadowsky
Profesor Asistente de Lingüística
Pontificia Universidad Católica de Chile

ssadowsky gmail com
scsadowsky uc cl
http://sadowsky.cl/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20170331/35a54146/attachment.html>


More information about the CWB mailing list