[CWB] Error #1300 generating word frequency lists

Hardie, Andrew a.hardie at lancaster.ac.uk
Mon Aug 6 10:10:05 CEST 2018


The key bit of the error message is this:

Error # 1300: Invalid utf8 character string: ''

(unfortunate that the actual bad string can’t be identified from this.)

This suggests that there is a bad string in the CWB index, and it is caught by the MySql db on freq list setup. Recent versions of CWB however should not permit the indexing of badly-encoded strings (recent meaning, last several years). You should have had an error at the encoding stage if there was an encoding error in your data.

What’s your CWB version? (also your CQPweb version) Also, is the underlying data UTF-8 or Latin-1?

best

Andrew.



From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of José Manuel Martínez Martínez
Sent: 06 August 2018 08:18
To: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it>
Subject: [CWB] Error #1300 generating word frequency lists

Good morning!

Trying to run collocations on a corpus in Spanish, I've got an error.

Somehow, the word frequency list wasn't generated.

I tried to generate it again but the process fails and I get the traceback that I copy/paste below.

Is this a CQPweb issue or should I check some settings of the MySQL database?

Cheers,

jmm

--- TRACEBACK ---

CQPweb encountered an error and could not continue.


A MySQL query did not run successfully!





Original query: LOAD DATA LOCAL INFILE '/data/cqpweb/tmp/______tempfreq_spanish.tbl' INTO TABLE `__tempfreq_spanish` FIELDS ESCAPED BY '' /* from User: datamaran | Function: corpus_make_freqtables() | 2018-Aug-03 12:41:27 */





Error # 1300: Invalid utf8 character string: ''



PHP debugging backtrace
array(6) {
  [1]=>
  array(4) {
    ["file"]=>
    string(40) "/var/www/html/cqpweb/lib/library.inc.php"
    ["line"]=>
    int(286)
    ["function"]=>
    string(20) "exiterror_mysqlquery"
    ["args"]=>
    array(3) {
      [0]=>
      int(1300)
      [1]=>
      string(33) "Invalid utf8 character string: ''"
      [2]=>
      string(210) "LOAD DATA LOCAL INFILE '/data/cqpweb/tmp/______tempfreq_spanish.tbl' INTO TABLE `__tempfreq_spanish` FIELDS ESCAPED BY ''
            /* from User: datamaran | Function: corpus_make_freqtables() | 2018-Aug-03 12:41:27 */"
    }
  }
  [2]=>
  array(4) {
    ["file"]=>
    string(40) "/var/www/html/cqpweb/lib/library.inc.php"
    ["line"]=>
    int(410)
    ["function"]=>
    string(14) "do_mysql_query"
    ["args"]=>
    array(1) {
      [0]=>
      &string(210) "LOAD DATA LOCAL INFILE '/data/cqpweb/tmp/______tempfreq_spanish.tbl' INTO TABLE `__tempfreq_spanish` FIELDS ESCAPED BY ''
            /* from User: datamaran | Function: corpus_make_freqtables() | 2018-Aug-03 12:41:27 */"
    }
  }
  [3]=>
  array(4) {
    ["file"]=>
    string(42) "/var/www/html/cqpweb/lib/freqtable.inc.php"
    ["line"]=>
    int(124)
    ["function"]=>
    string(21) "do_mysql_infile_query"
    ["args"]=>
    array(3) {
      [0]=>
      string(18) "__tempfreq_spanish"
      [1]=>
      string(43) "/data/cqpweb/tmp/______tempfreq_spanish.tbl"
      [2]=>
      bool(true)
    }
  }
  [4]=>
  array(4) {
    ["file"]=>
    string(42) "/var/www/html/cqpweb/lib/admin-lib.inc.php"
    ["line"]=>
    int(838)
    ["function"]=>
    string(22) "corpus_make_freqtables"
    ["args"]=>
    array(1) {
      [0]=>
      string(7) "spanish"
    }
  }
  [5]=>
  array(4) {
    ["file"]=>
    string(47) "/var/www/html/cqpweb/lib/metadata-admin.inc.php"
    ["line"]=>
    int(179)
    ["function"]=>
    string(40) "create_text_metadata_auto_freqlist_calls"
    ["args"]=>
    array(1) {
      [0]=>
      string(7) "spanish"
    }
  }
  [6]=>
  array(4) {
    ["file"]=>
    string(43) "/var/www/html/cqpweb/exe/metadata-admin.php"
    ["line"]=>
    int(3)
    ["args"]=>
    array(1) {
      [0]=>
      string(47) "/var/www/html/cqpweb/lib/metadata-admin.inc.php"
    }
    ["function"]=>
    string(7) "require"
  }
}

--
José Manuel Martínez Martínez
https://chozelinek.github.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180806/b00e28c5/attachment-0001.html>


More information about the CWB mailing list