[CWB] CQPweb 3.0.7 on CWB 3.4.3 cwb-scan-corpus error! Segmentation
fault
Ray Wu
liangpingwu at 126.com
Sat May 26 14:49:04 CEST 2012
Hi all,
I want to make CQPweb to process Chinese (my native tongue) on my Ubuntu 8.04, so I updated to CWB 3.4.3. The compiling process was successful and I could query a tiny Chinese text via cqp from the terminal.
I could also load the Chinese text into CQPweb and finished part of the metadata page. But when I wanted to "Manage metadata->Create frequency tables", CQPweb complained and says it encountered an error and could not continue. Here is the error message:
cwb-scan-corpus error! Segmentation fault
... in file /usr/local/apache2/htdocs/cqp/lib/freqtable.inc.php line 100.
This sounds strange to me as I have browsed the entire archived mailing list and get to know that error message is mostly likely to happen when a token is too long. But my toy corpus is just a few lines long. I tried it on an small English text and the same situation occurs.
To make the picture clearer, I will try to illustrate my experiment by listing what I have done.
My compiling context for CWB 3.4.3: CWB from svn: 3.4.3; PCRE: 7.4; glib-2.0; gcc: Ubuntu 4.2.4-1ubuntu3.
The compiling process seemed normal and I could build a tiny Chinese corpus using the following text (See the end of the post). Hopefully it can make it through the wild net to your computer remaining intelligible):
I then ran the following to index it:
ray at ray-laptop:~$ cwb-encode -c utf8 -d /home/ray/cqputf8 -f cqpweb_chinese_test_utf8.txt -R /usr/local/share/cwb/registry/test -P pos -S text -S s -S text_id
Annotations of s-attribute <text> not stored (file cqpweb_chinese_test_utf8.txt, line #1, warning issued only once).
ray at ray-laptop:~$ cwb-makeall -V TEST (everyting says OK)
ray at ray-laptop:~$ cwb-huffcode -A TEST (fine, nothing wrong)
ray at ray-laptop:~$ cwb-compress-rdx -A TEST (fine again)
I queried the new corpus and nothing broken:
ray at ray-laptop:~$ cqp -eC
[no corpus]> TEST
TEST> "了";
7: 们的行为也引来 <了> 不少公园游客的
29: ,他们早已习惯 <了> 。
TEST> <s> []* "了" []* </s>; (query is OK)
Finally, I resorted to run cwb-scan-corpus manually and did find something usual:
ray at ray-laptop:~$ cwb-scan-corpus -C TEST pos (fully OK)
ray at ray-laptop:~$ cwb-scan-corpus TEST pos+0 pos+1 (segmentation fault)
ray at ray-laptop:~$ cwb-scan-corpus TEST pos+0 pos+1 pos+2
Scanning corpus TEST for 3-tuples ...
Scan complete.
Printing frequency table on stdout ...
...
段错误 ("segmentation fault" in English)
I have very little knowledge in C so I cannot go further to investigate more.
Does anyone know where the problem is? Thanks for any input.
Best,
Ray
Hunan University of Commerce, China
PS: My computer parameters:
System: Ubuntu 8.04
Apache: 2.0.63
MySQL: 5.0.88
PHP: 5.2.12 (lower than expected 5.3.0)
Perl: 5.8.8
CWB: 3.4.3 (compiled from svn source)
Linux utilites: awk, tar, gzip, iconv
LANG=zh_CN.UTF-8
GDM_LANG=zh_CN.UTF-8
Inside cqpweb_chinese_test_utf8.txt:
<text id="test">
<s>
这些 r
网友 n
们 k
的 u
行为 n
也 d
引来 v
了 u
不少 m
公园 n
游客 n
的 u
围观 v
。 w
</s>
<s>
而 c
对于 p
人们 n
的 u
议论 v
, w
这些 r
汉 t
服 v
爱好者 n
表示 v
, w
他们 r
早已 d
习惯 v
了 y
。 w
</s>
</text>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20120526/e43bd300/attachment-0001.htm
More information about the CWB
mailing list