<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Dear all,<div class=""><br class=""></div><div class="">I fear I might be lost in encoding hell: I am trying to install a corpus on CQPweb, but get the following error message when creating word and annotation frequency tables (the last step of generating frequency lists):</div><div class=""><br class=""></div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;" class=""><div class=""><span style="font-size: 14px;" class="">An SQL query did not run successfully!</span></div><div class=""><span style="font-size: 14px;" class=""><br class=""></span></div><div class=""><span style="font-size: 14px;" class="">Original query: LOAD DATA LOCAL INFILE '/var/cqpdata/temp/______tempfreq_topagrar_v4.tbl' INTO TABLE `__tempfreq_topagrar_v4` FIELDS ESCAPED BY '' /* from User: thilo | Function: corpus_make_freqtables() | 2021-Nov-18 16:05 */</span></div><div class=""><span style="font-size: 14px;" class=""><br class=""></span></div><div class=""><span style="font-size: 14px;" class="">Error # 1300: Invalid utf8 character string: '</span><span style="font-size: 14px;" class="">'</span></div><div class=""><br class=""></div></blockquote>The corpus contains texts parsed from a web blog. I write an xml-file using python lxml and run the result through treetagger before installing it on cqpweb. It sounds like an encoding problem, although I am doing my best to remove anything potentially broken in python (e.g. running all strings through bytes(string, 'utf-8').decode('utf-8', 'ignore‘)). <div class=""><br class=""><div class="">Checking for invalid UTF-8 characters in the input xml-file using grep (grep -axv '.*‘ file.txt) yields no results. Converting the file with iconv -f utf-8 -t utf-8 -c file.xml > newfile.xml makes no difference.<div class=""><br class=""></div><div class="">Any suggestion how to solve or narrow down the problem (e.g. finding the line or text id causing the issue)?<div class=""><br class=""></div><div class="">Thanks a lot!</div><div class="">Thilo</div></div><div class=""><br class=""></div><div class="">Server Setup:</div><div class="">OS: Ubuntu 18.04</div><div class="">DB: MariaDB 10.1</div><div class="">CQPweb v3.2.43</div><div class="">PHP: 7.2</div><div class=""><br class=""></div><div class="">PHP debugging backtrace:</div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;" class=""><div class=""><div class=""><span style="font-size: 14px;" class="">array(6) {</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [1]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> array(4) {</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["file"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(43) "/var/www/html/diskurs/lib/exiterror-lib.php"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["line"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> int(367)</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["function"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(9) "exiterror"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["args"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> array(3) {</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [0]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> array(3) {</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [0]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(38) "An SQL query did not run successfully!"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [1]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(232) "Original query: </span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""><br class=""></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class="">LOAD DATA LOCAL INFILE '/var/cqpdata/temp/______tempfreq_topagrar_v4.tbl' INTO TABLE `__tempfreq_topagrar_v4` FIELDS ESCAPED BY '' </span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""><span class="Apple-tab-span" style="white-space:pre">        </span>/* from User: thilo | Function: corpus_make_freqtables() | 2021-Nov-18 16:05 */</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""><br class=""></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class="">"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [2]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(48) "Error # 1300: Invalid utf8 character string: '' "</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> }</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [1]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> NULL</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [2]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> NULL</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> }</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> }</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [2]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> array(4) {</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["file"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(37) "/var/www/html/diskurs/lib/sql-lib.php"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["line"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> int(216)</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["function"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(18) "exiterror_sqlquery"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["args"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> array(3) {</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [0]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> int(1300)</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [1]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(33) "Invalid utf8 character string: ''"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [2]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(212) "LOAD DATA LOCAL INFILE '/var/cqpdata/temp/______tempfreq_topagrar_v4.tbl' INTO TABLE `__tempfreq_topagrar_v4` FIELDS ESCAPED BY '' </span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""><span class="Apple-tab-span" style="white-space:pre">        </span>/* from User: thilo | Function: corpus_make_freqtables() | 2021-Nov-18 16:05 */"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> }</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> }</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [3]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> array(4) {</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["file"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(37) "/var/www/html/diskurs/lib/sql-lib.php"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["line"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> int(350)</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["function"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(12) "do_sql_query"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["args"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> array(1) {</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [0]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(212) "LOAD DATA LOCAL INFILE '/var/cqpdata/temp/______tempfreq_topagrar_v4.tbl' INTO TABLE `__tempfreq_topagrar_v4` FIELDS ESCAPED BY '' </span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""><span class="Apple-tab-span" style="white-space:pre">        </span>/* from User: thilo | Function: corpus_make_freqtables() | 2021-Nov-18 16:05 */"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> }</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> }</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [4]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> array(4) {</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["file"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(43) "/var/www/html/diskurs/lib/freqtable-lib.php"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["line"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> int(127)</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["function"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(19) "do_sql_infile_query"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["args"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> array(3) {</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [0]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(22) "__tempfreq_topagrar_v4"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [1]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(48) "/var/cqpdata/temp/______tempfreq_topagrar_v4.tbl"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [2]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> bool(true)</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> }</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> }</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [5]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> array(4) {</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["file"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(37) "/var/www/html/diskurs/lib/execute.php"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["line"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> int(196)</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["function"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(22) "corpus_make_freqtables"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["args"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> array(1) {</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [0]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(11) "topagrar_v4"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> }</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> }</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [6]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> array(4) {</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["file"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(37) "/var/www/html/diskurs/exe/execute.php"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["line"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> int(1)</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["args"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> array(1) {</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> [0]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(37) "/var/www/html/diskurs/lib/execute.php"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> }</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> ["function"]=></span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> string(7) "require"</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class=""> }</span></div></div><div class=""><div class=""><span style="font-size: 14px;" class="">}</span></div></div></blockquote><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""> </div></div></div></body></html>