[CWB] CL: Out of memory. (killed)

Scott Sadowsky ssadowsky at gmail.com
Fri Mar 31 01:42:56 CEST 2017


Sure!

The IMS Open Corpus Workbench (CWB)

Copyright (C) 1993-2006 by IMS, University of Stuttgart
Original developer:       Oliver Christ
    with contributions by Bruno Maximilian Schulze
Version 3.0 developed by: Stefan Evert
    with contributions by Arne Fitschen

Copyright (C) 2007-today by the CWB open-source community
    individual contributors are listed in source file AUTHORS

Download and contact: http://cwb.sourceforge.net/

Compiled:  Sun 26 Mar 19:37:22 CLST 2017
Version:   3.4.11

Mind you, I downloaded and compiled the latest development version about a
week ago, and that build number isn't shown here. If you need it and can
tell me how to get it, I'll be glad to do so.

Cheers!
Scott

On Thu, Mar 30, 2017 at 3:07 PM, Hardie, Andrew <a.hardie at lancaster.ac.uk>
wrote:

> Hi Scott,
>
>
>
> Could you check what version  this is with *cqp -v* please?
>
>
>
> thanks
>
>
>
> best
>
>
>
> Andrew
>
>
>
> *From:* cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] *On
> Behalf Of *Scott Sadowsky
> *Sent:* 30 March 2017 19:04
> *To:* CWBdev Mailing List
> *Subject:* [CWB] CL: Out of memory. (killed)
>
>
>
> When it rains, it pours, I guess!
>
>
>
> I have a fairly large corpus (880m words) which I've been using for some
> time without incident (this is NOT related to the corpus I asked about
> yesterday, the processing of which topped out at 2^31 tokens).
>
>
>
> Unfortunately, I've just happened upon a specific word, which when I
> search for it with cqp, crashes the program with the following error:
>
>
>
> CC-C> "ábaco"
> CL: Out of memory. (killed)
> CL: [cl_realloc(block at 0x7f7e78c99010 to -2147479552 bytes)]
>
> 135515175:  Ahí aparecen : un retrato iluminado de l mandarín Van-ta-gin
> ; un junco ; un molino de arroz ; los retratos iluminados de un chino y un
> hoten
> tote ; diversos caracteres de la escritura china ; la reproducción de una
> moneda en anverso y reverso ; la reproducción de los signos grabados en una
> cap
> arazón de tortuga utilizada para la adivinación , con el nombre de "
> tortue mistique " ; una vista de la parte oriental de Parque de Gé-hol ; el
> ciclo ch
> ino ; un <ábaco> ; el proceso de formación de letras ; reproducción de
> diversas armas de artillería ; instrumentos musicales como flautas ,
> violines , gu
> itarras , trompetas , liras , gongs , tambores , campanas ; un puente ;
> una aldea y sus habitantes ; la casa de un mandarín y diversas melodías en
> llave
> de sol : Mon-lie-ouha , aires chinos y un aire musical cantado en una
> chalupa china .
> *{ ~ } $*
>
>
> The prompt above is the Linux terminal, rather than CQP's command line, by
> the way. The error comes after pegging the processor core at 100% for a
> good 30-45 seconds. Results for simple queries like this are normally
> returned in milliseconds.
>
>
>
> Further testing has produced what are to me strange results. "árbol" works
> *fine*, but "ébola" crashes CQP, as seen below:
>
>
>
> CC-C> "ébola"
> CL: Out of memory. (killed)
> CL: [cl_realloc(block at 0x7f02d14b7010 to -2147479552 bytes)]
>
> 146356674:  SIDA y el <ébola> son corresponde y es falso ,
> 147036486:  pertenece a l mismo grupo de l mortal virus <ébola> .
> 178273950:  Hay muchas enfermedades , como el caso de l hanta , de l <
> ébola> , de l lassa , de l dengue , etcétera , para las cuales no existen
> vacunas ,
> y nuestro Instituto de Salud Pública podría enfrentar las suficientemente
> .
> *{ ~ } $*
>
>
> Other searches with word-initial non-ASCII characters have also produced
> crashes, such as "ácaro". But, as seen above with "árbol", at least one
> doesn't.
>
>
>
> The errors are also happening with words which have non-ASCII characters
> in other places, such as "esdrújula".
>
>
>
> Note that this corpus is UTF-8 encoded.
>
>
>
> Any ideas? I've never had this problem before, and I still don't with
> other corpora of similar size.
>
>
>
> Cheers,
>
> Scott
>
>
>
>
>
> --
>
> Dr. Scott Sadowsky
> Profesor Asistente de Lingüística
>
> Pontificia Universidad Católica de Chile
>
>
>
> ssadowsky gmail com
>
> scsadowsky uc cl
> http://sadowsky.cl/
>
>
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>


-- 
Dr. Scott Sadowsky
Profesor Asistente de Lingüística
Pontificia Universidad Católica de Chile

ssadowsky gmail com
scsadowsky uc cl
http://sadowsky.cl/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20170330/2e73d69a/attachment.html>


More information about the CWB mailing list