<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><!--[if gte mso 9]><xml>
 <w:WordDocument>
  <w:View>Normal</w:View>
  <w:Zoom>0</w:Zoom>
  <w:PunctuationKerning/>
  <w:DrawingGridVerticalSpacing>7.8 磅</w:DrawingGridVerticalSpacing>
  <w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
  <w:DisplayVerticalDrawingGridEvery>2</w:DisplayVerticalDrawingGridEvery>
  <w:ValidateAgainstSchemas/>
  <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
  <w:IgnoreMixedContent>false</w:IgnoreMixedContent>
  <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
  <w:Compatibility>
   <w:SpaceForUL/>
   <w:BalanceSingleByteDoubleByteWidth/>
   <w:DoNotLeaveBackslashAlone/>
   <w:ULTrailSpace/>
   <w:DoNotExpandShiftReturn/>
   <w:AdjustLineHeightInTable/>
   <w:BreakWrappedTables/>
   <w:SnapToGridInCell/>
   <w:WrapTextWithPunct/>
   <w:UseAsianBreakRules/>
   <w:DontGrowAutofit/>
   <w:UseFELayout/>
  </w:Compatibility>
  <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
 </w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
 <w:LatentStyles DefLockedState="false" LatentStyleCount="156">
 </w:LatentStyles>
</xml><![endif]--><!--[if !mso]><object
 classid="clsid:38481807-CA0E-42D2-BF39-B33AF135CC4D" id=ieooui></object>
<style>
st1\:*{behavior:url(#ieooui) }
</style>
<![endif]--><!--[if gte mso 10]>
<style>
 /* Style Definitions */
 table.MsoNormalTable
        {mso-style-name:普通表格;
        mso-tstyle-rowband-size:0;
        mso-tstyle-colband-size:0;
        mso-style-noshow:yes;
        mso-style-parent:"";
        mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
        mso-para-margin:0cm;
        mso-para-margin-bottom:.0001pt;
        mso-pagination:widow-orphan;
        font-size:10.0pt;
        font-family:"Times New Roman";
        mso-fareast-font-family:"Times New Roman";
        mso-ansi-language:#0400;
        mso-fareast-language:#0400;
        mso-bidi-language:#0400;}
</style>
<![endif]-->

<p class="MsoNormal"><span lang="EN-US">Hi all,<br>
I want to make CQPweb to process Chinese (my native tongue) on my Ubuntu 8.04,
so I updated to CWB 3.4.3. The
compiling process was successful and I could query a tiny Chinese text via cqp
from the terminal.<br>
<br>
I could also load the Chinese text into CQPweb&nbsp; and finished part of the
metadata page. But when I wanted to "Manage metadata-&gt;Create frequency
tables",&nbsp; CQPweb complained and says it encountered an error and
could not continue. Here is the error message:<br>
cwb-scan-corpus error! Segmentation fault<br>
... in file /usr/local/apache2/htdocs/cqp/lib/freqtable.inc.php line 100. <br>
<br>
This sounds strange to me as I have browsed the entire archived mailing list
and get to know that error message is mostly likely to happen when a token is
too long. But my toy corpus is just a few lines long. I tried it on an small
English text and the same situation occurs.<br>
<br>
To make the picture clearer, I will try to illustrate my experiment by listing
what I have done.<br>
<br>
My compiling context for CWB 3.4.3: CWB from svn: 3.4.3; PCRE: 7.4; glib-2.0;
gcc: Ubuntu 4.2.4-1ubuntu3. <br>
<br>
The compiling process seemed normal and I could build a tiny Chinese corpus
using the following text (See the end of the post). Hopefully it can make it
through the wild net to your computer remaining intelligible):<br>
<br>
I then ran the following to index it:</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">ray@ray-laptop:~$ cwb-encode -c utf8 -d /home/ray/cqputf8 -f
cqpweb_chinese_test_utf8.txt -R /usr/local/share/cwb/registry/test -P pos -S
text -S s -S text_id</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Annotations of s-attribute &lt;text&gt; not stored (file
cqpweb_chinese_test_utf8.txt, line #1, warning issued only once).</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">ray@ray-laptop:~$ cwb-makeall -V TEST (everyting says OK)</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">ray@ray-laptop:~$ cwb-huffcode -A TEST (fine, nothing wrong)</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">ray@ray-laptop:~$ cwb-compress-rdx<span style="mso-spacerun:yes">&nbsp;
</span>-A TEST (fine again)</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">I queried the new corpus and nothing broken:</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">ray@ray-laptop:~$ cqp -eC</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">[no corpus]&gt; TEST</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="mso-ansi-language:DE" lang="DE">TEST&gt; "</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;mso-hansi-font-family:
&quot;Times New Roman&quot;">了</span><span style="mso-ansi-language:DE" lang="DE">";</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="mso-ansi-language:DE" lang="DE"><span style="mso-spacerun:yes">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
</span>7: </span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">们</span><span style="mso-ansi-language:
DE"> </span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">的</span><span style="mso-ansi-language:
DE"> </span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">行为</span><span style="mso-ansi-language:
DE"> </span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">也</span><span style="mso-ansi-language:
DE"> </span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">引来</span><span style="mso-ansi-language:DE" lang="DE"> &lt;</span><span style="font-family:宋体;
mso-ascii-font-family:&quot;Times New Roman&quot;;mso-hansi-font-family:&quot;Times New Roman&quot;">了</span><span style="mso-ansi-language:DE" lang="DE">&gt; </span><span style="font-family:宋体;
mso-ascii-font-family:&quot;Times New Roman&quot;;mso-hansi-font-family:&quot;Times New Roman&quot;">不少</span><span style="mso-ansi-language:DE"> </span><span style="font-family:宋体;mso-ascii-font-family:
&quot;Times New Roman&quot;;mso-hansi-font-family:&quot;Times New Roman&quot;">公园</span><span style="mso-ansi-language:DE"> </span><span style="font-family:宋体;mso-ascii-font-family:
&quot;Times New Roman&quot;;mso-hansi-font-family:&quot;Times New Roman&quot;">游客</span><span style="mso-ansi-language:DE"> </span><span style="font-family:宋体;mso-ascii-font-family:
&quot;Times New Roman&quot;;mso-hansi-font-family:&quot;Times New Roman&quot;">的</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="mso-ansi-language:DE" lang="DE"><span style="mso-spacerun:yes">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
</span>29:<span style="mso-spacerun:yes">&nbsp; </span></span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;mso-hansi-font-family:
&quot;Times New Roman&quot;;mso-ansi-language:DE">,</span><span style="mso-ansi-language:
DE"> </span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">他们</span><span style="mso-ansi-language:
DE"> </span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">早已</span><span style="mso-ansi-language:
DE"> </span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">习惯</span><span style="mso-ansi-language:DE" lang="DE"> &lt;</span><span style="font-family:宋体;
mso-ascii-font-family:&quot;Times New Roman&quot;;mso-hansi-font-family:&quot;Times New Roman&quot;">了</span><span style="mso-ansi-language:DE" lang="DE">&gt; </span><span style="font-family:宋体;
mso-ascii-font-family:&quot;Times New Roman&quot;;mso-hansi-font-family:&quot;Times New Roman&quot;">。</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">TEST&gt; &lt;s&gt; []* "</span><span style="font-family:宋体;
mso-ascii-font-family:&quot;Times New Roman&quot;;mso-hansi-font-family:&quot;Times New Roman&quot;">了</span><span lang="EN-US">" []* &lt;/s&gt;;<span style="mso-spacerun:yes">&nbsp; </span>(query
is OK) </span></p>

<p class="MsoNormal"><span lang="EN-US"><br>
Finally, I resorted to run </span><span style="mso-bidi-font-size:
10.5pt" lang="EN-US"><span style="mso-spacerun:yes">&nbsp;</span>cwb-scan-corpus manually and did
find something usual:</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">ray@ray-laptop:~$ cwb-scan-corpus -C TEST pos (fully OK)</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">ray@ray-laptop:~$ cwb-scan-corpus TEST pos+0 pos+1 (segmentation
fault)</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">ray@ray-laptop:~$ cwb-scan-corpus TEST pos+0 pos+1 pos+2</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Scanning corpus TEST for 3-tuples ... </span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Scan complete.<span style="mso-spacerun:yes">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Printing frequency table on stdout ... </span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="mso-ansi-language:PL" lang="PL">...</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;mso-hansi-font-family:
&quot;Times New Roman&quot;">段错误</span><span style="mso-ansi-language:PL" lang="PL">
("segmentation fault" in English)</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">I have very little knowledge in C so I cannot go further to
investigate more. </span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Does anyone know where the problem is? Thanks for any input.</span></p>

<p class="MsoNormal" style=""><span lang="EN-US">Best, <br>
Ray</span></p><p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Hunan University of Commerce, China<br></span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">PS: My computer parameters:</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">System: Ubuntu 8.04<br>
Apache: 2.0.63<br>
MySQL: 5.0.88<br>
PHP: 5.2.12 (lower than expected 5.3.0)<br>
Perl: 5.8.8<br>
CWB: 3.4.3 (compiled from svn source)<br>
Linux utilites: awk, tar, gzip, iconv</span></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">LANG=zh_CN.UTF-8</span></p>

<div style="mso-element:para-border-div;border:none;border-bottom:double windowtext 2.25pt;
padding:0cm 0cm 1.0pt 0cm">

<p class="MsoNormal" style=""><span lang="EN-US">GDM_LANG=zh_CN.UTF-8</span></p><p class="MsoNormal" style=""><span lang="EN-US">Inside cqpweb_chinese_test_utf8.txt: <br></span></p><p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">&lt;text id="test"&gt;<br>
&lt;s&gt;<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">这些</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
r<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">网友</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
n<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">们</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
k<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">的</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
u<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">行为</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
n<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">也</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
d<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">引来</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
v<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">了</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
u<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">不少</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
m<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">公园</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
n<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">游客</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
n<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">的</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
u<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">围观</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
v<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">。</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
w<br>
&lt;/s&gt;<br>
&lt;s&gt;<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">而</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
c<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">对于</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
p<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">人们</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
n<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">的</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
u<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">议论</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
v<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">,</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
w<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">这些</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
r<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">汉</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
t<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">服</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
v<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">爱好者</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
n<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">表示</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
v<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">,</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
w<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">他们</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
r<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">早已</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
d<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">习惯</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
v<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">了</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
y<br>
</span><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">。</span><span lang="EN-US">&nbsp;&nbsp;&nbsp;
w<br>
&lt;/s&gt;<br>
&lt;/text&gt;</span><span style="mso-bidi-font-size:24.0pt" lang="EN-US"></span></p></div>

</div><br><br><span title="neteasefooter"><span id="netease_mail_footer"></span></span>