<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:PunctuationKerning/>
<w:DrawingGridVerticalSpacing>7.8 磅</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>2</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:SpaceForUL/>
<w:BalanceSingleByteDoubleByteWidth/>
<w:DoNotLeaveBackslashAlone/>
<w:ULTrailSpace/>
<w:DoNotExpandShiftReturn/>
<w:AdjustLineHeightInTable/>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:UseFELayout/>
</w:Compatibility>
<w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="156">
</w:LatentStyles>
</xml><![endif]--><!--[if !mso]><object
classid="clsid:38481807-CA0E-42D2-BF39-B33AF135CC4D" id=ieooui></object>
<style>
st1\:*{behavior:url(#ieooui) }
</style>
<![endif]--><!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
        {mso-style-name:普通表格;
        mso-tstyle-rowband-size:0;
        mso-tstyle-colband-size:0;
        mso-style-noshow:yes;
        mso-style-parent:"";
        mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
        mso-para-margin:0cm;
        mso-para-margin-bottom:.0001pt;
        mso-pagination:widow-orphan;
        font-size:10.0pt;
        font-family:"Times New Roman";
        mso-fareast-font-family:"Times New Roman";
        mso-ansi-language:#0400;
        mso-fareast-language:#0400;
        mso-bidi-language:#0400;}
</style>
<![endif]-->
<p class="MsoNormal"><span lang="EN-US">Hi all,<br>
I want to make CQPweb to process Chinese (my native tongue) on my Ubuntu 8.04,
so I updated to CWB 3.4.3. The
compiling process was successful and I could query a tiny Chinese text via cqp
from the terminal.<br>
<br>
I could also load the Chinese text into CQPweb and finished part of the
metadata page. But when I wanted to "Manage metadata->Create frequency
tables", CQPweb complained and says it encountered an error and
could not continue. Here is the error message:<br>
cwb-scan-corpus error! Segmentation fault<br>
... in file /usr/local/apache2/htdocs/cqp/lib/freqtable.inc.php line 100. <br>
<br>
This sounds strange to me as I have browsed the entire archived mailing list
and get to know that error message is mostly likely to happen when a token is
too long. But my toy corpus is just a few lines long. I tried it on an small
English text and the same situation occurs.<br>
<br>
To make the picture clearer, I will try to illustrate my experiment by listing
what I have done.<br>
<br>
My compiling context for CWB 3.4.3: CWB from svn: 3.4.3; PCRE: 7.4; glib-2.0;
gcc: Ubuntu 4.2.4-1ubuntu3. <br>
<br>
The compiling process seemed normal and I could build a tiny Chinese corpus
using the following text (See the end of the post). Hopefully it can make it
through the wild net to your computer remaining intelligible):<br>
<br>
I then ran the following to index it:</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">ray@ray-laptop:~$ cwb-encode -c utf8 -d /home/ray/cqputf8 -f
cqpweb_chinese_test_utf8.txt -R /usr/local/share/cwb/registry/test -P pos -S
text -S s -S text_id</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Annotations of s-attribute <text> not stored (file
cqpweb_chinese_test_utf8.txt, line #1, warning issued only once).</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">ray@ray-laptop:~$ cwb-makeall -V TEST (everyting says OK)</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">ray@ray-laptop:~$ cwb-huffcode -A TEST (fine, nothing wrong)</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">ray@ray-laptop:~$ cwb-compress-rdx<span style="mso-spacerun:yes">
</span>-A TEST (fine again)</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">I queried the new corpus and nothing broken:</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">ray@ray-laptop:~$ cqp -eC</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">[no corpus]> TEST</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="mso-ansi-language:DE" lang="DE">TEST> "</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"">了</span><span style="mso-ansi-language:DE" lang="DE">";</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="mso-ansi-language:DE" lang="DE"><span style="mso-spacerun:yes">
</span>7: </span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">们</span><span style="mso-ansi-language:
DE"> </span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">的</span><span style="mso-ansi-language:
DE"> </span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">行为</span><span style="mso-ansi-language:
DE"> </span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">也</span><span style="mso-ansi-language:
DE"> </span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">引来</span><span style="mso-ansi-language:DE" lang="DE"> <</span><span style="font-family:宋体;
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"">了</span><span style="mso-ansi-language:DE" lang="DE">> </span><span style="font-family:宋体;
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"">不少</span><span style="mso-ansi-language:DE"> </span><span style="font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"">公园</span><span style="mso-ansi-language:DE"> </span><span style="font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"">游客</span><span style="mso-ansi-language:DE"> </span><span style="font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"">的</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="mso-ansi-language:DE" lang="DE"><span style="mso-spacerun:yes">
</span>29:<span style="mso-spacerun:yes"> </span></span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman";mso-ansi-language:DE">,</span><span style="mso-ansi-language:
DE"> </span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">他们</span><span style="mso-ansi-language:
DE"> </span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">早已</span><span style="mso-ansi-language:
DE"> </span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">习惯</span><span style="mso-ansi-language:DE" lang="DE"> <</span><span style="font-family:宋体;
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"">了</span><span style="mso-ansi-language:DE" lang="DE">> </span><span style="font-family:宋体;
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"">。</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">TEST> <s> []* "</span><span style="font-family:宋体;
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"">了</span><span lang="EN-US">" []* </s>;<span style="mso-spacerun:yes"> </span>(query
is OK) </span></p>
<p class="MsoNormal"><span lang="EN-US"><br>
Finally, I resorted to run </span><span style="mso-bidi-font-size:
10.5pt" lang="EN-US"><span style="mso-spacerun:yes"> </span>cwb-scan-corpus manually and did
find something usual:</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">ray@ray-laptop:~$ cwb-scan-corpus -C TEST pos (fully OK)</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">ray@ray-laptop:~$ cwb-scan-corpus TEST pos+0 pos+1 (segmentation
fault)</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">ray@ray-laptop:~$ cwb-scan-corpus TEST pos+0 pos+1 pos+2</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Scanning corpus TEST for 3-tuples ... </span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Scan complete.<span style="mso-spacerun:yes"> </span></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Printing frequency table on stdout ... </span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="mso-ansi-language:PL" lang="PL">...</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"">段错误</span><span style="mso-ansi-language:PL" lang="PL">
("segmentation fault" in English)</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">I have very little knowledge in C so I cannot go further to
investigate more. </span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Does anyone know where the problem is? Thanks for any input.</span></p>
<p class="MsoNormal" style=""><span lang="EN-US">Best, <br>
Ray</span></p><p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Hunan University of Commerce, China<br></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">PS: My computer parameters:</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">System: Ubuntu 8.04<br>
Apache: 2.0.63<br>
MySQL: 5.0.88<br>
PHP: 5.2.12 (lower than expected 5.3.0)<br>
Perl: 5.8.8<br>
CWB: 3.4.3 (compiled from svn source)<br>
Linux utilites: awk, tar, gzip, iconv</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">LANG=zh_CN.UTF-8</span></p>
<div style="mso-element:para-border-div;border:none;border-bottom:double windowtext 2.25pt;
padding:0cm 0cm 1.0pt 0cm">
<p class="MsoNormal" style=""><span lang="EN-US">GDM_LANG=zh_CN.UTF-8</span></p><p class="MsoNormal" style=""><span lang="EN-US">Inside cqpweb_chinese_test_utf8.txt: <br></span></p><p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"><text id="test"><br>
<s><br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">这些</span><span lang="EN-US">
r<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">网友</span><span lang="EN-US">
n<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">们</span><span lang="EN-US">
k<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">的</span><span lang="EN-US">
u<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">行为</span><span lang="EN-US">
n<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">也</span><span lang="EN-US">
d<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">引来</span><span lang="EN-US">
v<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">了</span><span lang="EN-US">
u<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">不少</span><span lang="EN-US">
m<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">公园</span><span lang="EN-US">
n<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">游客</span><span lang="EN-US">
n<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">的</span><span lang="EN-US">
u<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">围观</span><span lang="EN-US">
v<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">。</span><span lang="EN-US">
w<br>
</s><br>
<s><br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">而</span><span lang="EN-US">
c<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">对于</span><span lang="EN-US">
p<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">人们</span><span lang="EN-US">
n<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">的</span><span lang="EN-US">
u<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">议论</span><span lang="EN-US">
v<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">,</span><span lang="EN-US">
w<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">这些</span><span lang="EN-US">
r<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">汉</span><span lang="EN-US">
t<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">服</span><span lang="EN-US">
v<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">爱好者</span><span lang="EN-US">
n<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">表示</span><span lang="EN-US">
v<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">,</span><span lang="EN-US">
w<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">他们</span><span lang="EN-US">
r<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">早已</span><span lang="EN-US">
d<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">习惯</span><span lang="EN-US">
v<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">了</span><span lang="EN-US">
y<br>
</span><span style="font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"">。</span><span lang="EN-US">
w<br>
</s><br>
</text></span><span style="mso-bidi-font-size:24.0pt" lang="EN-US"></span></p></div>
</div><br><br><span title="neteasefooter"><span id="netease_mail_footer"></span></span>