[CWB] Error #1300 generating word frequency lists
José Manuel Martínez Martínez
chozelinek at gmail.com
Mon Aug 6 11:43:48 CEST 2018
Hi Andrew,
thank you very much for your quick reply.
CQPweb v3.2.31
CWB v3.4.14
The underlying data should be UTF-8.
I cannot remember right now if I had encoding error at the encoding stage.
I'll re-encode the corpus and let you know if I get any error on that
regard.
Would be there a way to run from the command line the command to generate
the frequency lists? I think I can leave a script encoding incrementally
all texts I have in my corpus, to find out at least, which file is
producing problems.
Cheers,
--
José Manuel Martínez Martínez
https://chozelinek.github.io
On Mon, Aug 6, 2018 at 10:10 AM, Hardie, Andrew <a.hardie at lancaster.ac.uk>
wrote:
> The key bit of the error message is this:
>
>
>
> Error # 1300: Invalid utf8 character string: ''
>
>
>
> (unfortunate that the actual bad string can’t be identified from this.)
>
>
>
> This suggests that there is a bad string in the CWB index, and it is
> caught by the MySql db on freq list setup. Recent versions of CWB however
> should not permit the indexing of badly-encoded strings (recent meaning,
> last several years). You should have had an error at the encoding stage if
> there was an encoding error in your data.
>
>
>
> What’s your CWB version? (also your CQPweb version) Also, is the
> underlying data UTF-8 or Latin-1?
>
>
>
> best
>
>
>
> Andrew.
>
>
>
>
>
>
>
> *From:* cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> *On
> Behalf Of *José Manuel Martínez Martínez
> *Sent:* 06 August 2018 08:18
> *To:* Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it
> >
> *Subject:* [CWB] Error #1300 generating word frequency lists
>
>
>
> Good morning!
>
>
>
> Trying to run collocations on a corpus in Spanish, I've got an error.
>
>
>
> Somehow, the word frequency list wasn't generated.
>
>
>
> I tried to generate it again but the process fails and I get the traceback
> that I copy/paste below.
>
>
>
> Is this a CQPweb issue or should I check some settings of the MySQL
> database?
>
>
>
> Cheers,
>
>
>
> jmm
>
>
>
> --- TRACEBACK ---
>
>
>
> CQPweb encountered an error and could not continue.
>
>
>
>
>
> A MySQL query did not run successfully!
>
>
>
>
>
>
>
>
>
>
>
> Original query: LOAD DATA LOCAL INFILE '/data/cqpweb/tmp/______tempfreq_spanish.tbl'
> INTO TABLE `__tempfreq_spanish` FIELDS ESCAPED BY '' /* from User:
> datamaran | Function: corpus_make_freqtables() | 2018-Aug-03 12:41:27 */
>
>
>
>
>
>
>
>
>
>
>
> Error # 1300: Invalid utf8 character string: ''
>
>
>
>
>
>
>
> PHP debugging backtrace
>
> array(6) {
>
> [1]=>
>
> array(4) {
>
> ["file"]=>
>
> string(40) "/var/www/html/cqpweb/lib/library.inc.php"
>
> ["line"]=>
>
> int(286)
>
> ["function"]=>
>
> string(20) "exiterror_mysqlquery"
>
> ["args"]=>
>
> array(3) {
>
> [0]=>
>
> int(1300)
>
> [1]=>
>
> string(33) "Invalid utf8 character string: ''"
>
> [2]=>
>
> string(210) "LOAD DATA LOCAL INFILE '/data/cqpweb/tmp/______tempfreq_spanish.tbl'
> INTO TABLE `__tempfreq_spanish` FIELDS ESCAPED BY ''
>
> /* from User: datamaran | Function: corpus_make_freqtables() |
> 2018-Aug-03 12:41:27 */"
>
> }
>
> }
>
> [2]=>
>
> array(4) {
>
> ["file"]=>
>
> string(40) "/var/www/html/cqpweb/lib/library.inc.php"
>
> ["line"]=>
>
> int(410)
>
> ["function"]=>
>
> string(14) "do_mysql_query"
>
> ["args"]=>
>
> array(1) {
>
> [0]=>
>
> &string(210) "LOAD DATA LOCAL INFILE '/data/cqpweb/tmp/______tempfreq_spanish.tbl'
> INTO TABLE `__tempfreq_spanish` FIELDS ESCAPED BY ''
>
> /* from User: datamaran | Function: corpus_make_freqtables() |
> 2018-Aug-03 12:41:27 */"
>
> }
>
> }
>
> [3]=>
>
> array(4) {
>
> ["file"]=>
>
> string(42) "/var/www/html/cqpweb/lib/freqtable.inc.php"
>
> ["line"]=>
>
> int(124)
>
> ["function"]=>
>
> string(21) "do_mysql_infile_query"
>
> ["args"]=>
>
> array(3) {
>
> [0]=>
>
> string(18) "__tempfreq_spanish"
>
> [1]=>
>
> string(43) "/data/cqpweb/tmp/______tempfreq_spanish.tbl"
>
> [2]=>
>
> bool(true)
>
> }
>
> }
>
> [4]=>
>
> array(4) {
>
> ["file"]=>
>
> string(42) "/var/www/html/cqpweb/lib/admin-lib.inc.php"
>
> ["line"]=>
>
> int(838)
>
> ["function"]=>
>
> string(22) "corpus_make_freqtables"
>
> ["args"]=>
>
> array(1) {
>
> [0]=>
>
> string(7) "spanish"
>
> }
>
> }
>
> [5]=>
>
> array(4) {
>
> ["file"]=>
>
> string(47) "/var/www/html/cqpweb/lib/metadata-admin.inc.php"
>
> ["line"]=>
>
> int(179)
>
> ["function"]=>
>
> string(40) "create_text_metadata_auto_freqlist_calls"
>
> ["args"]=>
>
> array(1) {
>
> [0]=>
>
> string(7) "spanish"
>
> }
>
> }
>
> [6]=>
>
> array(4) {
>
> ["file"]=>
>
> string(43) "/var/www/html/cqpweb/exe/metadata-admin.php"
>
> ["line"]=>
>
> int(3)
>
> ["args"]=>
>
> array(1) {
>
> [0]=>
>
> string(47) "/var/www/html/cqpweb/lib/metadata-admin.inc.php"
>
> }
>
> ["function"]=>
>
> string(7) "require"
>
> }
>
> }
>
>
>
> --
>
> José Manuel Martínez Martínez
>
> https://chozelinek.github.io
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180806/04e11be5/attachment.html>
More information about the CWB
mailing list