<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Verdana;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;
        mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:#0563C1;
        text-decoration:underline;}
span.E-MailFormatvorlage19
        {mso-style-type:personal-reply;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:70.85pt 70.85pt 2.0cm 70.85pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="DE" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US">Hi Andrew,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Thanks for your reply! I have updated SVN and database today and returned to this problem.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The creation of frequency lists that are not wanted definitely still happens, and the segmentation fault (which I guess is a consequence of this, because some of my columns are known to mess with CQP) persists.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">In addition, I had to comment out two lines in general_lib.php:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> /* NKD changes more things than ND does, e.g. sharp s to normal s, or fi-lig to fi */<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> </span>#$str = normalizer_normalize($str, Normalizer::FORM_KD);<o:p></o:p></p>
<p class="MsoNormal"> <span lang="EN-US">$str = preg_replace('/\p{M}/u', '', $str);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> </span>#$str = normalizer_normalize($str, Normalizer::FORM_C);<o:p></o:p></p>
<p class="MsoNormal"> <span lang="EN-US">/* i.e. the last thing we do is re-combine anything that wasn't scrubbed */<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> }<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">It seems the normalizer_normalize() function is not implemented yet.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">I’d be grateful for any pointers or help!<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal">Best wishes,<o:p></o:p></p>
<p class="MsoNormal">Peter<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span style="mso-fareast-language:DE">Von:</span></b><span style="mso-fareast-language:DE"> cwb-bounces@sslmit.unibo.it <cwb-bounces@sslmit.unibo.it>
<b>Im Auftrag von </b>Hardie, Andrew<br>
<b>Gesendet:</b> Mittwoch, 6. </span><span lang="EN-US" style="mso-fareast-language:DE">April 2022 12:04<br>
<b>An:</b> Open source development of the Corpus WorkBench <cwb@sslmit.unibo.it><br>
<b>Betreff:</b> Re: [CWB] CQPweb offline-freqlists.php problems<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F4B7D">Hi Peter,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F4B7D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F4B7D">Thanks for the bug report. I’ve just updated this script in SVN – it had fallen out of sync with the main code (largely cos I’ve not had cause to
use it in a while …) so as to fix all the bugs you report except the segfault and the creation of freqlists even when not wanted for an annotation. But I
<i>hope</i> those were side-effects of earlier bugs. So, there is a chance they will be fixed also. If not – let me know.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F4B7D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F4B7D">(btw that instance of cwb-makeall was running after a segfault hit either cwb-encode or cwb-decode – so it can safely be killed)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F4B7D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F4B7D">best<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F4B7D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F4B7D">Andrew.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F4B7D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="mso-fareast-language:EN-GB">From:</span></b><span lang="EN-US" style="mso-fareast-language:EN-GB">
</span><a href="mailto:cwb-bounces@sslmit.unibo.it"><span lang="EN-US" style="mso-fareast-language:EN-GB">cwb-bounces@sslmit.unibo.it</span></a><span lang="EN-US" style="mso-fareast-language:EN-GB"> <</span><a href="mailto:cwb-bounces@sslmit.unibo.it"><span lang="EN-US" style="mso-fareast-language:EN-GB">cwb-bounces@sslmit.unibo.it</span></a><span lang="EN-US" style="mso-fareast-language:EN-GB">>
<b>On Behalf Of </b>Uhrig, Peter<br>
<b>Sent:</b> 04 April 2022 21:14<br>
<b>To:</b> Open source development of the Corpus WorkBench <</span><a href="mailto:cwb@sslmit.unibo.it"><span lang="EN-US" style="mso-fareast-language:EN-GB">cwb@sslmit.unibo.it</span></a><span lang="EN-US" style="mso-fareast-language:EN-GB">><br>
<b>Subject:</b> [CWB] CQPweb offline-freqlists.php problems<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><span lang="EN-GB"><o:p> </o:p></span></p>
<div>
<p class="MsoNormal"><span lang="EN-US">Dear all,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The script offline-freqlists.php causes me some loss of sleep:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">First, it throws the following error in my setup:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">PHP Fatal error: Uncaught TypeError: Argument 1 passed to drop_unneeded_corpus_freqtable_components() must be of the type int, string given, called in /var/www/html/web/bin/offline-freqlists.php
on line 137 and defined in /var/www/html/web/lib/freqtable-lib.php:345<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">Stack trace:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">#0 /var/www/html/web/bin/offline-freqlists.php(137): drop_unneeded_corpus_freqtable_components()<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">#1 {main}<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New""> thrown in /var/www/html/web/lib/freqtable-lib.php on line 345<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">This seems to be a bug in the code because the function drop_unneeded_corpus_freqtable_components really requires a corpus_id, but is called with a corpus_name from offline-freqlists.php:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">drop_unneeded_corpus_freqtable_components($corpus);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">I have thus replaced the line with<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">$corpus_id = corpus_name_to_id($corpus);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">drop_unneeded_corpus_freqtable_components($corpus_id);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">This means it continued past the previous error, but I was greeted with a similar one:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">About to run the function populating corpus CQP positions...<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">PHP Fatal error: Uncaught TypeError: Argument 1 passed to populate_corpus_cqp_positions() must be of the type int, string given, called in /var/www/html/web/bin/offline-freqlists.php
on line 150 and defined in /var/www/html/web/lib/corpus-lib.php:1098<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">Stack trace:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">#0 /var/www/html/web/bin/offline-freqlists.php(150): populate_corpus_cqp_positions()<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">#1 {main}<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New""> thrown in /var/www/html/web/lib/corpus-lib.php on line 1098<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">OK, same thing, replace $corpus with $corpus_id and try again. This time it gets further:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">About to run the function populating corpus CQP positions...<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">Done populating corpus CQP positions.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">Function calculating category sizes was not run because there aren't any text classifications.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">According to my corpus metadata table, there ARE text classifications. Why does it say there aren’t?<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">And finally: <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">About to run the function making the CWB text-by-text frequency index...<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">Beginning to filter data from decode to encode to build the frequency-by-text CWB index...<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">Segmentation fault<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Courier New"">Encoding of the by-text CWB frequency index is now complete.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">That segmentation fault most likely comes from CQP, but unfortunately it does not say what exactly was going on at the time. I noticed that the __freq folder contains indexes even for p-attributes for which I specifically
selected “N” in the “Needs FT” column of the “Manage Annotation” dialogue. Is this expected?<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">It is still running, just not saying what it is doing, but I can see that “cwb-makeall -M 1000 -r /data/corpora/cqpweb/registry -V MY_CORPUS_NAME__FREQ” is currently running, so I guess the segmentation fault may not
have been critical. I’ll probably find out soon…<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Any help would be greatly appreciated!<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Thanks and all the best!<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Peter<o:p></o:p></span></p>
</div>
</div>
</body>
</html>