<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Verdana;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;
        color:black;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p.msonormal0, li.msonormal0, div.msonormal0
        {mso-style-name:msonormal;
        mso-margin-top-alt:auto;
        margin-right:0cm;
        mso-margin-bottom-alt:auto;
        margin-left:0cm;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;
        color:black;}
span.EmailStyle18
        {mso-style-type:personal-reply;
        font-family:"Verdana",sans-serif;
        color:#1F497D;
        font-weight:normal;
        font-style:normal;
        text-decoration:none none;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body bgcolor="white" lang="EN-GB" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">It took me a while to find the thread in question since the archive page doesn’t allow access to digest issue numbers. The thread is
 the first one here:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><a href="http://liste.sslmit.unibo.it/pipermail/cwb/2018-August/thread.html">http://liste.sslmit.unibo.it/pipermail/cwb/2018-August/thread.html</a>
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">I am planning the change to utf8mb4 for v3.3.0 . I
<i>hope</i> this will follow v 3.2.32, which is the next upcoming version . 3.2.32 will be a feature upgrade (the time needed to write the new features is why there has not been a release for so very long) that also partially implements some of the big restructuring
 that I’ve known for ages is needed, it can be considered the release candidate for 3.3.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">My hope is that no more than one or two bug-fix versions will be needed before I can branch 3.2 off and go to 3.3 which will do
<i>nothing</i> except the mb4 changeover.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">In the meantime, Gerhard, the php CLI script below will scrub out 4 byte utf8 characters from a file – replacing them with U&#43;FFFD,
 question mark in little box. Call it with an input file as first argument.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">best<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">Andrew.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Courier New&quot;;color:#1F497D;mso-fareast-language:EN-US">&lt;?php<o:p></o:p></span></b></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Courier New&quot;;color:#1F497D;mso-fareast-language:EN-US">if (empty($argv[1])) exit(&quot;Please specify an input file.\n&quot;);<o:p></o:p></span></b></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Courier New&quot;;color:#1F497D;mso-fareast-language:EN-US">$src = fopen($argv[1], 'r');<o:p></o:p></span></b></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Courier New&quot;;color:#1F497D;mso-fareast-language:EN-US">$dst = fopen(&quot;{$argv[1]}.mod&quot;, 'w');<o:p></o:p></span></b></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Courier New&quot;;color:#1F497D;mso-fareast-language:EN-US">while (false !== ($line = fgets($src)))<o:p></o:p></span></b></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Courier New&quot;;color:#1F497D;mso-fareast-language:EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; fputs($dst, preg_replace(&quot;/[\xf0-\xf4][\x80-\xbf]{3}/&quot;, &quot;\xef\xbf\xbd&quot;, $line));<o:p></o:p></span></b></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Courier New&quot;;color:#1F497D;mso-fareast-language:EN-US">fclose($src);<o:p></o:p></span></b></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Courier New&quot;;color:#1F497D;mso-fareast-language:EN-US">fclose($dst);</span></b><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:windowtext">From:</span></b><span lang="EN-US" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:windowtext"> cwb-bounces@sslmit.unibo.it
 &lt;cwb-bounces@sslmit.unibo.it&gt; <b>On Behalf Of </b>Gerhard Rampl<br>
<b>Sent:</b> 17 December 2018 11:14<br>
<b>To:</b> cwb@sslmit.unibo.it<br>
<b>Subject:</b> [CWB] Follow up to CWB Digest, Vol 139, Issue 14: Error #1300 generating word frequency lists<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<p>Hi Andrew and everybody, <br>
this is a follow up question to CWB Digest, Vol 139, Issue 14. I am running into the same Error #1300 when trying to build the frequency list of a rather large corpus of tweets in CQPweb (corpus indexed previously with CWB; using CQPweb v 3.2.31). The problem
 also seem to be characters that don't fit MySQL's UTF-8 encoding (that seems to be only a subset of the full UTF-8).<br>
Since I am not a programmer I'd rather not try the solution proposed in mentioned CWB Digest (seems rather delicate and Andrew wrote he would fix the problem anyway in one of the next releases). So my question is: in the meantime is there a way to identify
 (and replace) the characters responsible for error #1300 in the vrt-files? <br>
Thanks for any help, <br>
gerhard<o:p></o:p></p>
<div>
<p class="MsoNormal">-- <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><b>University of Innsbruck</b><o:p></o:p></p>
<div>
<p><span style="font-size:9.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#4B4B4B">Institute for Languages and Literatures: Linguistics
<o:p></o:p></span></p>
<p><b><span style="font-size:9.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#4B4B4B">Dr. Gerhard Rampl</span></b><span style="font-size:9.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#4B4B4B"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#4B4B4B"><o:p>&nbsp;</o:p></span></p>
</div>
</div>
</div>
</body>
</html>