<div dir="ltr">Hi Andrew,<div><br></div><div>thank you very much for your quick reply.</div><div><br></div><div>CQPweb v3.2.31<br></div><div><div>CWB v3.4.14</div></div><div><br></div><div>The underlying data should be UTF-8.</div><div><br></div><div>I cannot remember right now if I had encoding error at the encoding stage.</div><div><br></div><div>I'll re-encode the corpus and let you know if I get any error on that regard.</div><div><br></div><div>Would be there a way to run from the command line the command to generate the frequency lists? I think I can leave a script encoding incrementally all texts I have in my corpus, to find out at least, which file is producing problems.</div><div><br></div><div>Cheers,</div><div><br></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><div>--</div><div><div>José Manuel Martínez Martínez</div><div><a href="https://chozelinek.github.io" target="_blank">https://chozelinek.github.io</a></div></div></div></div></div></div></div></div>
<br><div class="gmail_quote">On Mon, Aug 6, 2018 at 10:10 AM, Hardie, Andrew <span dir="ltr"><<a href="mailto:a.hardie@lancaster.ac.uk" target="_blank">a.hardie@lancaster.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="EN-GB" link="blue" vlink="purple">
<div class="m_2185532496785904272WordSection1">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1f497d">The key bit of the error message is this:<u></u><u></u></span></p><span class="">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal">Error # 1300: Invalid utf8 character string: ''<u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1f497d"><u></u> <u></u></span></p>
</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1f497d">(unfortunate that the actual bad string can’t be identified from this.)<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1f497d">This suggests that there is a bad string in the CWB index, and it is caught by the MySql db on freq list setup. Recent versions of
CWB however should not permit the indexing of badly-encoded strings (recent meaning, last several years). You should have had an error at the encoding stage if there was an encoding error in your data.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1f497d">What’s your CWB version? (also your CQPweb version) Also, is the underlying data UTF-8 or Latin-1?<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1f497d">best<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1f497d">Andrew.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif"> <a href="mailto:cwb-bounces@sslmit.unibo.it" target="_blank">cwb-bounces@sslmit.unibo.it</a> <<a href="mailto:cwb-bounces@sslmit.unibo.it" target="_blank">cwb-bounces@sslmit.unibo.it</a>>
<b>On Behalf Of </b>José Manuel Martínez Martínez<br>
<b>Sent:</b> 06 August 2018 08:18<br>
<b>To:</b> Open source development of the Corpus WorkBench <<a href="mailto:cwb@sslmit.unibo.it" target="_blank">cwb@sslmit.unibo.it</a>><br>
<b>Subject:</b> [CWB] Error #1300 generating word frequency lists<u></u><u></u></span></p><div><div class="h5">
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">Good morning!<u></u><u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">Trying to run collocations on a corpus in Spanish, I've got an error.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Somehow, the word frequency list wasn't generated.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">I tried to generate it again but the process fails and I get the traceback that I copy/paste below.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Is this a CQPweb issue or should I check some settings of the MySQL database?<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Cheers,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">jmm<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">--- TRACEBACK ---<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<div>
<p class="MsoNormal">CQPweb encountered an error and could not continue.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">A MySQL query did not run successfully!<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Original query: LOAD DATA LOCAL INFILE '/data/cqpweb/tmp/______<wbr>tempfreq_spanish.tbl' INTO TABLE `__tempfreq_spanish` FIELDS ESCAPED BY '' /* from User: datamaran | Function: corpus_make_freqtables() | 2018-Aug-03 12:41:27 */<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Error # 1300: Invalid utf8 character string: ''<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">PHP debugging backtrace<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">array(6) {<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> [1]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> array(4) {<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["file"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(40) "/var/www/html/cqpweb/lib/<wbr>library.inc.php"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["line"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> int(286)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["function"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(20) "exiterror_mysqlquery"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["args"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> array(3) {<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> [0]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> int(1300)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> [1]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(33) "Invalid utf8 character string: ''"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> [2]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(210) "LOAD DATA LOCAL INFILE '/data/cqpweb/tmp/______<wbr>tempfreq_spanish.tbl' INTO TABLE `__tempfreq_spanish` FIELDS ESCAPED BY '' <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> /* from User: datamaran | Function: corpus_make_freqtables() | 2018-Aug-03 12:41:27 */"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> }<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> }<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> [2]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> array(4) {<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["file"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(40) "/var/www/html/cqpweb/lib/<wbr>library.inc.php"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["line"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> int(410)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["function"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(14) "do_mysql_query"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["args"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> array(1) {<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> [0]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> &string(210) "LOAD DATA LOCAL INFILE '/data/cqpweb/tmp/______<wbr>tempfreq_spanish.tbl' INTO TABLE `__tempfreq_spanish` FIELDS ESCAPED BY '' <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> /* from User: datamaran | Function: corpus_make_freqtables() | 2018-Aug-03 12:41:27 */"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> }<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> }<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> [3]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> array(4) {<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["file"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(42) "/var/www/html/cqpweb/lib/<wbr>freqtable.inc.php"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["line"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> int(124)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["function"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(21) "do_mysql_infile_query"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["args"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> array(3) {<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> [0]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(18) "__tempfreq_spanish"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> [1]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(43) "/data/cqpweb/tmp/______<wbr>tempfreq_spanish.tbl"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> [2]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> bool(true)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> }<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> }<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> [4]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> array(4) {<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["file"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(42) "/var/www/html/cqpweb/lib/<wbr>admin-lib.inc.php"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["line"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> int(838)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["function"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(22) "corpus_make_freqtables"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["args"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> array(1) {<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> [0]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(7) "spanish"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> }<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> }<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> [5]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> array(4) {<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["file"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(47) "/var/www/html/cqpweb/lib/<wbr>metadata-admin.inc.php"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["line"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> int(179)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["function"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(40) "create_text_metadata_auto_<wbr>freqlist_calls"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["args"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> array(1) {<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> [0]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(7) "spanish"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> }<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> }<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> [6]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> array(4) {<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["file"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(43) "/var/www/html/cqpweb/exe/<wbr>metadata-admin.php"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["line"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> int(3)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["args"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> array(1) {<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> [0]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(47) "/var/www/html/cqpweb/lib/<wbr>metadata-admin.inc.php"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> }<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> ["function"]=><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> string(7) "require"<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> }<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">}<u></u><u></u></p>
</div>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal">--<u></u><u></u></p>
</div>
<div>
<div>
<p class="MsoNormal">José Manuel Martínez Martínez<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><a href="https://chozelinek.github.io" target="_blank">https://chozelinek.github.io</a><u></u><u></u></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div></div></div>
</div>
<br>______________________________<wbr>_________________<br>
CWB mailing list<br>
<a href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a><br>
<a href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb" rel="noreferrer" target="_blank">http://liste.sslmit.unibo.it/<wbr>mailman/listinfo/cwb</a><br>
<br></blockquote></div><br></div>