[CWB] Error #1300 generating word frequency lists

José Manuel Martínez Martínez chozelinek at gmail.com
Wed Aug 29 09:04:04 CEST 2018


If I have time to tackle the issue, I'll try and share the changes.
Cheers,

jmm
--
José Manuel Martínez Martínez
https://chozelinek.github.io


On Wed, Aug 29, 2018 at 8:54 AM José Manuel Martínez Martínez <
chozelinek at gmail.com> wrote:

> Hi Andrew,
>
> thanks for your answer. I've modified the `db.inc.php` like this:
>
> This is the diff of my modifications:
>
> Index: lib/db.inc.php
>
> ===================================================================
>
> --- lib/db.inc.php (revision 1057)
>
> +++ lib/db.inc.php (working copy)
>
> @@ -401,7 +401,7 @@
>
>   primary key(refnumber),
>
>   key(text_id) $extra_sql_keys
>
>
>
> - ) CHARACTER SET utf8 COLLATE utf8_bin";
>
> + ) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci";
>
>   /*
>
>   * note the use of a binary collation for distribution DBs, since
>
>   * they always contain handle IDs, not word or tag material.
>
> @@ -478,7 +478,7 @@
>
>   `$att` varchar(40) NOT NULL";
>
>   $create_statement .= ",
>
>   key (refnumber)
>
> - ) CHARACTER SET utf8 COLLATE {$Corpus->sql_collation}";
>
> + ) CHARACTER SET utf8mb4 COLLATE {$Corpus->sql_collation}";
>
>
>
>   break;
>
>
>
> @@ -533,7 +533,7 @@
>
>   endPosition int unsigned NOT NULL,
>
>   refnumber mediumint unsigned NOT NULL AUTO_INCREMENT,
>
>   primary key(refnumber)
>
> - ) CHARACTER SET utf8 COLLATE {$Corpus->sql_collation}";
>
> + ) CHARACTER SET utf8mb4 COLLATE {$Corpus->sql_collation}";
>
>
>
>   break;
>
>
>
> @@ -551,7 +551,7 @@
>
>   category varchar(40),
>
>   primary key(refnumber),
>
>   key(category)
>
> - ) CHARACTER SET utf8 COLLATE {$Corpus->sql_collation}";
>
> + ) CHARACTER SET utf8mb4 COLLATE {$Corpus->sql_collation}";
>
>
>
>   break;
>
>
>
> Index: lib/freqtable.inc.php
>
> ===================================================================
>
> --- lib/freqtable.inc.php (revision 1057)
>
> +++ lib/freqtable.inc.php (working copy)
>
> @@ -192,7 +192,7 @@
>
>   global $Config;
>
>   global $Corpus;
>
>   global $User;
>
> -
>
> + php_execute_time_unlimit();
>
>   global $cqp;
>
>
>
>   if (empty($cqp))
>
> Index: lib/indexforms-queries.inc.php
>
> ===================================================================
>
> --- lib/indexforms-queries.inc.php (revision 1057)
>
> +++ lib/indexforms-queries.inc.php (working copy)
>
> @@ -74,7 +74,7 @@
>
>   'sq_case'   => 'Simple query (case-sensitive)',
>
>   );
>
>   if (! array_key_exists($qmode, $modemap) )
>
> - $qmode = ($Corpus->uses_case_sensitivity ? 'sq_case' : 'sq_nocase');
>
> + $qmode = ($Corpus->uses_case_sensitivity ? 'sq_case' : 'cqp');
>
>   /* includes NULL, empty */
>
>
>
> But I'm getting this error when I try to run collocations:
>
> A MySQL query did not run successfully!
>
>
>
>
>
> Error # 1253: COLLATION 'utf8_general_ci' is not valid for CHARACTER SET
> 'utf8mb4'
>
> I guess that I need to touch the code in more places, and probably do that
> on a test environment (I put the change in production ;-)
>
> With this command grep -rn . -e 'utf8_'  I get this list of files that
> seems to contain mentions to UTF8:
>
> ./bin/upgrade-database.php:104:$Config->mysql_*utf8_*set_required =
> (isset($mysql_*utf8_*set_required) && $mysql_*utf8_*set_required);
>
> ./bin/upgrade-database.php:225: ) ENGINE=InnoDB CHARACTER SET utf8
> COLLATE *utf8_*bin',
>
> ./bin/upgrade-database.php:231: ) ENGINE=InnoDB CHARACTER SET utf8
> COLLATE *utf8_*bin',
>
> ./bin/upgrade-database.php:265: (`corpus` varchar(20) NOT NULL,`target`
> varchar(20) NOT NULL) ENGINE=InnoDB CHARACTER SET utf8 COLLATE *utf8_*
> bin',
>
> ./bin/upgrade-database.php:313:                    ) ENGINE=InnoDB
> CHARACTER SET utf8 COLLATE *utf8_*bin',
>
> ./bin/upgrade-database.php:382: ) ENGINE=InnoDB CHARACTER SET utf8
> COLLATE *utf8_*bin"
>
> ./bin/upgrade-database.php:401:         ) ENGINE=InnoDB CHARACTER SET
> utf8 COLLATE *utf8_*bin',
>
> ./bin/upgrade-database.php:411:         ) ENGINE=InnoDB CHARACTER SET
> utf8 COLLATE *utf8_*bin'
>
> ./bin/upgrade-database.php:665: ) CHARACTER SET utf8 COLLATE *utf8_*bin'
>
> ./bin/upgrade-database.php:690: ) CHARACTER SET utf8 COLLATE *utf8_*bin",
>
> ./bin/upgrade-database.php:698: ) CHARACTER SET utf8 COLLATE *utf8_*bin",
>
> ./bin/upgrade-database.php:739: 'alter table `annotation_metadata`
> collate *utf8_*bin',
>
> ./bin/upgrade-database.php:740: 'alter table `annotation_template_info`
> collate *utf8_*bin',
>
> ./bin/upgrade-database.php:741: 'alter table
> `annotation_template_content` collate *utf8_*bin',
>
> ./bin/upgrade-database.php:742: 'alter table `corpus_metadata_variable`
> collate *utf8_*bin',
>
> ./bin/upgrade-database.php:743: 'alter table `saved_dbs` collate *utf8_*
> bin',
>
> ./bin/upgrade-database.php:744: 'alter table `saved_freqtables` collate
> *utf8_*bin',
>
> ./bin/upgrade-database.php:745: 'alter table `saved_subcorpora` collate
> *utf8_*bin',
>
> ./bin/upgrade-database.php:746: 'alter table `user_memberships` collate
> *utf8_*bin',
>
> ./bin/upgrade-database.php:747: 'alter table `user_privilege_info`
> collate *utf8_*bin',
>
> ./bin/upgrade-database.php:748: 'alter table `query_history` collate
> *utf8_*bin',
>
> ./bin/upgrade-database.php:749: 'alter table `system_processes` collate
> *utf8_*bin',
>
> ./bin/upgrade-database.php:750: 'alter table `text_metadata_fields`
> collate *utf8_*bin',
>
> ./bin/upgrade-database.php:751: 'alter table `text_metadata_values`
> collate *utf8_*bin',
>
> ./bin/upgrade-database.php:752: 'alter table `user_info` collate *utf8_*
> bin',
>
> ./bin/upgrade-database.php:753: /* using *utf8_*bin for user_info implies
> the following for specific columnss: */
>
> ./bin/upgrade-database.php:754: 'alter table `user_info` modify column
> `affiliation` varchar(255) CHARACTER SET utf8 COLLATE *utf8_*general_ci
> default NULL',
>
> ./bin/upgrade-database.php:755: 'alter table `user_info` modify column
> `email` varchar(255) CHARACTER SET utf8 COLLATE *utf8_*general_ci
> default NULL',
>
> ./bin/upgrade-database.php:756: 'alter table `user_info` modify column
> `realname` varchar(255) CHARACTER SET utf8 COLLATE *utf8_*general_ci
> default NULL',
>
> ./bin/upgrade-database.php:780: ) CHARACTER SET utf8 COLLATE *utf8_*bin",
>
> ./bin/upgrade-database.php:789: ) CHARACTER SET utf8 COLLATE *utf8_*bin",
>
> ./bin/upgrade-database.php:794: ) CHARACTER SET utf8 COLLATE *utf8_*bin",
>
> ./bin/upgrade-database.php:802: ) CHARACTER SET utf8 COLLATE *utf8_*bin",
>
> ./bin/upgrade-database.php:884:         ) CHARACTER SET utf8 COLLATE
> *utf8_*bin",
>
> ./bin/upgrade-database.php:891:          ) CHARACTER SET utf8 COLLATE
> *utf8_*bin"
>
> ./bin/upgrade-database.php:953: 'alter table user_info modify column
> `username` varchar(30) charset utf8 collate *utf8_*bin NOT NULL',
>
> ./bin/upgrade-database.php:959: ) CHARACTER SET utf8 COLLATE *utf8_*bin',
>
> ./bin/upgrade-database.php:968: ) CHARACTER SET utf8 COLLATE *utf8_*
> general_ci',
>
> ./bin/upgrade-database.php:973: ) CHARACTER SET utf8 COLLATE *utf8_*
> general_ci'
>
> ./bin/upgrade-database.php:1085:   setting_name varchar(20) NOT NULL
> collate *utf8_*bin,
>
> ./bin/upgrade-database.php:1088: ) CHARACTER SET utf8 COLLATE *utf8_*
> general_ci',
>
> ./bin/upgrade-database.php:1100:   `group_name` varchar(20) NOT NULL
> UNIQUE COLLATE *utf8_*bin,
>
> ./bin/upgrade-database.php:1104: ) CHARACTER SET utf8 COLLATE *utf8_*
> general_ci',
>
> ./bin/upgrade-database.php:1107: CHARACTER SET utf8 COLLATE *utf8_*
> general_ci',
>
> ./bin/upgrade-database.php:1170: ) CHARACTER SET utf8 COLLATE *utf8_*bin",
>
> ./bin/upgrade-database.php:1181: ) CHARACTER SET utf8 COLLATE *utf8_*
> general_ci",
>
> ./bin/upgrade-database.php:1184: CHARACTER SET utf8 COLLATE *utf8_*
> general_ci",
>
> ./bin/upgrade-database.php:1187: CHARACTER SET utf8 COLLATE *utf8_*
> general_ci",
>
> ./bin/autosetup.php:69:$Config->mysql_*utf8_*set_required = $mysql_*utf8_*
> set_required;
>
> ./lib/admin-lib.inc.php:478: ) CHARSET utf8 COLLATE *utf8_*bin ");
>
> ./lib/concordance-post.inc.php:601: $extra_sort_pos_sql .= ", before$i
> COLLATE *utf8_*general_ci ";
>
> ./lib/concordance-post.inc.php:607: $extra_sort_pos_sql = ", after1
> COLLATE *utf8_*general_ci"
>
> ./lib/concordance-post.inc.php:608: . ", after2 COLLATE *utf8_*general_ci"
>
> ./lib/concordance-post.inc.php:609: . ", after3 COLLATE *utf8_*general_ci"
>
> ./lib/concordance-post.inc.php:610: . ", after4 COLLATE *utf8_*general_ci"
>
> ./lib/concordance-post.inc.php:611: . ", after5 COLLATE *utf8_*
> general_ci";
>
> ./lib/concordance-post.inc.php:617: $extra_sort_pos_sql .= ", after$i
> COLLATE *utf8_*general_ci";
>
> ./lib/concordance-post.inc.php:649: ORDER BY $sort_position_sql COLLATE
> *utf8_*general_ci  $extra_sort_pos_sql ";
>
> ./lib/concordance-post.inc.php:651: * we always use *utf8_*general_ci for
> the actual sorting,
>
> ./lib/concordance-post.inc.php:652: * even if the collation of the sort
> DB is actually *utf8_*bin
>
> ./lib/db.inc.php:278: $*utf8_*filename = $tabfile .'.utf8.tmp';
>
> ./lib/db.inc.php:281:                     $*utf8_*filename,
>
> ./lib/db.inc.php:286: rename($*utf8_*filename, $tabfile);
>
> ./lib/library.inc.php:180: if ($Config->mysql_*utf8_*set_required)
>
> ./lib/library.inc.php:549: return $corpus_info->uses_case_sensitivity ? '
> *utf8_*bin' : '*utf8_*general_ci' ;
>
> ./lib/defaults.inc.php:80: METADATA_TYPE_CLASSIFICATION => 'varchar(255)
> default NULL COLLATE *utf8_*bin',
>
> ./lib/defaults.inc.php:81: METADATA_TYPE_FREETEXT       => 'text default
> NULL COLLATE *utf8_*general_ci',
>
> ./lib/defaults.inc.php:82: METADATA_TYPE_IDLINK         => 'varchar(255)
> default NULL COLLATE *utf8_*bin',
>
> ./lib/defaults.inc.php:83: METADATA_TYPE_UNIQUE_ID      => 'varchar(255)
> default NULL COLLATE *utf8_*bin',
>
> ./lib/defaults.inc.php:84: METADATA_TYPE_DATE           => 'varchar(255)
> default NULL COLLATE *utf8_*bin',
>
> ./lib/defaults.inc.php:224:if (!isset($mysql_*utf8_*set_required))
>
> ./lib/defaults.inc.php:225: $mysql_*utf8_*set_required = true;
>
> ./lib/sql-definitions.inc.php:154: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:166: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:175: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:187: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:200: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:209: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:216: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:225: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*general_ci";
>
> ./lib/sql-definitions.inc.php:307: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*general_ci";
>
> ./lib/sql-definitions.inc.php:316: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:327: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:340: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:349: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:359: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:376: ) $engine CHARACTER SET utf8 collate
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:389: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:410: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:424: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:437: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:447: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:474: ) $engine_if_fulltext_key_needed
> CHARACTER SET utf8 COLLATE *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:489: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:503: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:508: setting_name varchar(20) NOT NULL
> collate *utf8_*bin,
>
> ./lib/sql-definitions.inc.php:511: ) CHARACTER SET utf8 COLLATE *utf8_*general_ci";
> /* note that for this one we don't care about the engine */
>
> ./lib/sql-definitions.inc.php:520: ) $engine CHARACTER SET utf8 COLLATE
> *utf8_*bin";
>
> ./lib/sql-definitions.inc.php:528: `content` text character set utf8
> collate *utf8_*bin,
>
> ./lib/sql-definitions.inc.php:
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180829/17982e37/attachment-0001.html>


More information about the CWB mailing list