[CWB] Error #1300 generating word frequency lists

José Manuel Martínez Martínez chozelinek at gmail.com
Mon Aug 6 11:43:48 CEST 2018


Hi Andrew,

thank you very much for your quick reply.

CQPweb v3.2.31
CWB v3.4.14

The underlying data should be UTF-8.

I cannot remember right now if I had encoding error at the encoding stage.

I'll re-encode the corpus and let you know if I get any error on that
regard.

Would be there a way to run from the command line the command to generate
the frequency lists? I think I can leave a script encoding incrementally
all texts I have in my corpus, to find out at least, which file is
producing problems.

Cheers,


--
José Manuel Martínez Martínez
https://chozelinek.github.io

On Mon, Aug 6, 2018 at 10:10 AM, Hardie, Andrew <a.hardie at lancaster.ac.uk>
wrote:

> The key bit of the error message is this:
>
>
>
> Error # 1300: Invalid utf8 character string: ''
>
>
>
> (unfortunate that the actual bad string can’t be identified from this.)
>
>
>
> This suggests that there is a bad string in the CWB index, and it is
> caught by the MySql db on freq list setup. Recent versions of CWB however
> should not permit the indexing of badly-encoded strings (recent meaning,
> last several years). You should have had an error at the encoding stage if
> there was an encoding error in your data.
>
>
>
> What’s your CWB version? (also your CQPweb version) Also, is the
> underlying data UTF-8 or Latin-1?
>
>
>
> best
>
>
>
> Andrew.
>
>
>
>
>
>
>
> *From:* cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> *On
> Behalf Of *José Manuel Martínez Martínez
> *Sent:* 06 August 2018 08:18
> *To:* Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it
> >
> *Subject:* [CWB] Error #1300 generating word frequency lists
>
>
>
> Good morning!
>
>
>
> Trying to run collocations on a corpus in Spanish, I've got an error.
>
>
>
> Somehow, the word frequency list wasn't generated.
>
>
>
> I tried to generate it again but the process fails and I get the traceback
> that I copy/paste below.
>
>
>
> Is this a CQPweb issue or should I check some settings of the MySQL
> database?
>
>
>
> Cheers,
>
>
>
> jmm
>
>
>
> --- TRACEBACK ---
>
>
>
> CQPweb encountered an error and could not continue.
>
>
>
>
>
> A MySQL query did not run successfully!
>
>
>
>
>
>
>
>
>
>
>
> Original query: LOAD DATA LOCAL INFILE '/data/cqpweb/tmp/______tempfreq_spanish.tbl'
> INTO TABLE `__tempfreq_spanish` FIELDS ESCAPED BY '' /* from User:
> datamaran | Function: corpus_make_freqtables() | 2018-Aug-03 12:41:27 */
>
>
>
>
>
>
>
>
>
>
>
> Error # 1300: Invalid utf8 character string: ''
>
>
>
>
>
>
>
> PHP debugging backtrace
>
> array(6) {
>
>   [1]=>
>
>   array(4) {
>
>     ["file"]=>
>
>     string(40) "/var/www/html/cqpweb/lib/library.inc.php"
>
>     ["line"]=>
>
>     int(286)
>
>     ["function"]=>
>
>     string(20) "exiterror_mysqlquery"
>
>     ["args"]=>
>
>     array(3) {
>
>       [0]=>
>
>       int(1300)
>
>       [1]=>
>
>       string(33) "Invalid utf8 character string: ''"
>
>       [2]=>
>
>       string(210) "LOAD DATA LOCAL INFILE '/data/cqpweb/tmp/______tempfreq_spanish.tbl'
> INTO TABLE `__tempfreq_spanish` FIELDS ESCAPED BY ''
>
>             /* from User: datamaran | Function: corpus_make_freqtables() |
> 2018-Aug-03 12:41:27 */"
>
>     }
>
>   }
>
>   [2]=>
>
>   array(4) {
>
>     ["file"]=>
>
>     string(40) "/var/www/html/cqpweb/lib/library.inc.php"
>
>     ["line"]=>
>
>     int(410)
>
>     ["function"]=>
>
>     string(14) "do_mysql_query"
>
>     ["args"]=>
>
>     array(1) {
>
>       [0]=>
>
>       &string(210) "LOAD DATA LOCAL INFILE '/data/cqpweb/tmp/______tempfreq_spanish.tbl'
> INTO TABLE `__tempfreq_spanish` FIELDS ESCAPED BY ''
>
>             /* from User: datamaran | Function: corpus_make_freqtables() |
> 2018-Aug-03 12:41:27 */"
>
>     }
>
>   }
>
>   [3]=>
>
>   array(4) {
>
>     ["file"]=>
>
>     string(42) "/var/www/html/cqpweb/lib/freqtable.inc.php"
>
>     ["line"]=>
>
>     int(124)
>
>     ["function"]=>
>
>     string(21) "do_mysql_infile_query"
>
>     ["args"]=>
>
>     array(3) {
>
>       [0]=>
>
>       string(18) "__tempfreq_spanish"
>
>       [1]=>
>
>       string(43) "/data/cqpweb/tmp/______tempfreq_spanish.tbl"
>
>       [2]=>
>
>       bool(true)
>
>     }
>
>   }
>
>   [4]=>
>
>   array(4) {
>
>     ["file"]=>
>
>     string(42) "/var/www/html/cqpweb/lib/admin-lib.inc.php"
>
>     ["line"]=>
>
>     int(838)
>
>     ["function"]=>
>
>     string(22) "corpus_make_freqtables"
>
>     ["args"]=>
>
>     array(1) {
>
>       [0]=>
>
>       string(7) "spanish"
>
>     }
>
>   }
>
>   [5]=>
>
>   array(4) {
>
>     ["file"]=>
>
>     string(47) "/var/www/html/cqpweb/lib/metadata-admin.inc.php"
>
>     ["line"]=>
>
>     int(179)
>
>     ["function"]=>
>
>     string(40) "create_text_metadata_auto_freqlist_calls"
>
>     ["args"]=>
>
>     array(1) {
>
>       [0]=>
>
>       string(7) "spanish"
>
>     }
>
>   }
>
>   [6]=>
>
>   array(4) {
>
>     ["file"]=>
>
>     string(43) "/var/www/html/cqpweb/exe/metadata-admin.php"
>
>     ["line"]=>
>
>     int(3)
>
>     ["args"]=>
>
>     array(1) {
>
>       [0]=>
>
>       string(47) "/var/www/html/cqpweb/lib/metadata-admin.inc.php"
>
>     }
>
>     ["function"]=>
>
>     string(7) "require"
>
>   }
>
> }
>
>
>
> --
>
> José Manuel Martínez Martínez
>
> https://chozelinek.github.io
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180806/04e11be5/attachment.html>


More information about the CWB mailing list