<div dir="ltr">Sure!<div><br></div><div><div><font face="monospace, monospace">The IMS Open Corpus Workbench (CWB)</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">Copyright (C) 1993-2006 by IMS, University of Stuttgart</font></div><div><font face="monospace, monospace">Original developer: Oliver Christ</font></div><div><font face="monospace, monospace"> with contributions by Bruno Maximilian Schulze</font></div><div><font face="monospace, monospace">Version 3.0 developed by: Stefan Evert</font></div><div><font face="monospace, monospace"> with contributions by Arne Fitschen</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">Copyright (C) 2007-today by the CWB open-source community</font></div><div><font face="monospace, monospace"> individual contributors are listed in source file AUTHORS</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">Download and contact: <a href="http://cwb.sourceforge.net/">http://cwb.sourceforge.net/</a></font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">Compiled: Sun 26 Mar 19:37:22 CLST 2017</font></div><div><font face="monospace, monospace">Version: 3.4.11</font></div></div><div><br></div><div>Mind you, I downloaded and compiled the latest development version about a week ago, and that build number isn't shown here. If you need it and can tell me how to get it, I'll be glad to do so.</div><div><br></div><div>Cheers!</div><div>Scott</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Mar 30, 2017 at 3:07 PM, Hardie, Andrew <span dir="ltr"><<a href="mailto:a.hardie@lancaster.ac.uk" target="_blank">a.hardie@lancaster.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="EN-GB" link="blue" vlink="purple">
<div class="m_-6821175733575031055WordSection1">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d">Hi Scott,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d">Could you check what version this is with
<b>cqp -v</b> please?<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d">thanks<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d">best<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d">Andrew<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> <a href="mailto:cwb-bounces@sslmit.unibo.it" target="_blank">cwb-bounces@sslmit.unibo.it</a> [mailto:<a href="mailto:cwb-bounces@sslmit.unibo.it" target="_blank">cwb-bounces@sslmit.<wbr>unibo.it</a>]
<b>On Behalf Of </b>Scott Sadowsky<br>
<b>Sent:</b> 30 March 2017 19:04<br>
<b>To:</b> CWBdev Mailing List<br>
<b>Subject:</b> [CWB] CL: Out of memory. (killed)<u></u><u></u></span></p><div><div class="h5">
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">When it rains, it pours, I guess!<u></u><u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">I have a fairly large corpus (880m words) which I've been using for some time without incident (this is NOT related to the corpus I asked about yesterday, the processing of which topped out at 2^31 tokens). <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Unfortunately, I've just happened upon a specific word, which when I search for it with cqp, crashes the program with the following error:<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New";color:black">CC-C> "ábaco"
</span><span style="font-family:"Courier New""><br>
CL: Out of memory. (killed) <wbr> <br>
CL: [cl_realloc(block at 0x7f7e78c99010 to -2147479552 bytes)] <br>
<br>
<span style="color:#1818b2">135515175:</span><span style="color:black"> Ahí aparecen : un retrato iluminado de l mandarín Van-ta-gin ; un junco ; un molino de arroz ; los retratos iluminados de un chino y un hoten</span><br>
tote ; diversos caracteres de la escritura china ; la reproducción de una moneda en anverso y reverso ; la reproducción de los signos grabados en una cap<br>
arazón de tortuga utilizada para la adivinación , con el nombre de " tortue mistique " ; una vista de la parte oriental de Parque de Gé-hol ; el ciclo ch<br>
ino ; un <<span style="color:white;background:black">ábaco</span><span style="color:black">> ; el proceso de formación de letras ; reproducción de diversas armas de artillería ; instrumentos musicales como flautas , violines , gu</span><br>
itarras , trompetas , liras , gongs , tambores , campanas ; un puente ; una aldea y sus habitantes ; la casa de un mandarín y diversas melodías en llave
<br>
de sol : Mon-lie-ouha , aires chinos y un aire musical cantado en una chalupa china .
<br>
<b><span style="color:#5454ff">{ ~ }</span><span style="color:#54ff54"> $</span></b></span><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><br clear="all">
<u></u><u></u></p>
<div>
<p class="MsoNormal">The prompt above is the Linux terminal, rather than CQP's command line, by the way. The error comes after pegging the processor core at 100% for a good 30-45 seconds. Results for simple queries like this are normally returned in milliseconds.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Further testing has produced what are to me strange results. "árbol" works
<u>fine</u>, but "ébola" crashes CQP, as seen below:<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New";color:black">CC-C> "ébola"
</span><span style="font-family:"Courier New""><br>
CL: Out of memory. (killed) <wbr> <br>
CL: [cl_realloc(block at 0x7f02d14b7010 to -2147479552 bytes)] <br>
<br>
<span style="color:#1818b2">146356674:</span><span style="color:black"> SIDA y el <</span><span style="color:white;background:black">ébola</span><span style="color:black">> son corresponde y es falso ,
</span><br>
<span style="color:#1818b2">147036486:</span><span style="color:black"> pertenece a l mismo grupo de l mortal virus <</span><span style="color:white;background:black">ébola</span><span style="color:black">> .
</span><br>
<span style="color:#1818b2">178273950:</span><span style="color:black"> Hay muchas enfermedades , como el caso de l hanta , de l <</span><span style="color:white;background:black">ébola</span><span style="color:black">> , de l lassa , de l dengue , etcétera
, para las cuales no existen vacunas ,</span><br>
y nuestro Instituto de Salud Pública podría enfrentar las suficientemente . <br>
<b><span style="color:#5454ff">{ ~ }</span><span style="color:#54ff54"> $</span></b></span><u></u><u></u></p>
</div>
<p class="MsoNormal"><br>
Other searches with word-initial non-ASCII characters have also produced crashes, such as "ácaro". But, as seen above with "árbol", at least one doesn't.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">The errors are also happening with words which have non-ASCII characters in other places, such as "esdrújula".<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Note that this corpus is UTF-8 encoded.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Any ideas? I've never had this problem before, and I still don't with other corpora of similar size.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Cheers,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Scott<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal"><br>
<br>
--<u></u><u></u></p>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt">Dr. Scott Sadowsky<br>
Profesor Asistente de Lingüística<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt">Pontificia Universidad Católica de Chile<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt"><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt">ssadowsky gmail com<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt">scsadowsky uc cl<br>
<a href="http://sadowsky.cl/" target="_blank">http://sadowsky.cl/</a><u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt"> <u></u><u></u></span></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div></div></div>
</div>
<br>______________________________<wbr>_________________<br>
CWB mailing list<br>
<a href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a><br>
<a href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb" rel="noreferrer" target="_blank">http://liste.sslmit.unibo.it/<wbr>mailman/listinfo/cwb</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div style="font-size:12.7272720336914px">Dr. Scott Sadowsky<br>Profesor Asistente de Lingüística</div><div dir="ltr" style="font-size:12.7272720336914px">Pontificia Universidad Católica de Chile<br></div><div dir="ltr" style="font-size:12.7272720336914px"><br></div><div dir="ltr" style="font-size:12.7272720336914px">ssadowsky gmail com</div><div dir="ltr" style="font-size:12.7272720336914px">scsadowsky uc cl<br><a href="http://sadowsky.cl/" target="_blank">http://sadowsky.cl/</a></div><div dir="ltr" style="font-size:12.7272720336914px"> </div></div></div></div></div></div></div></div></div></div></div>
</div>