[CWB] CQPweb 3.0.7 on CWB 3.4.3 cwb-scan-corpus error! Segmentation fault

Ray Wu liangpingwu at 126.com
Sat May 26 14:49:04 CEST 2012


Hi all,
I want to make CQPweb to process Chinese (my native tongue) on my Ubuntu 8.04, so I updated to CWB 3.4.3. The compiling process was successful and I could query a tiny Chinese text via cqp from the terminal.

I could also load the Chinese text into CQPweb  and finished part of the metadata page. But when I wanted to "Manage metadata->Create frequency tables",  CQPweb complained and says it encountered an error and could not continue. Here is the error message:
cwb-scan-corpus error! Segmentation fault
... in file /usr/local/apache2/htdocs/cqp/lib/freqtable.inc.php line 100.

This sounds strange to me as I have browsed the entire archived mailing list and get to know that error message is mostly likely to happen when a token is too long. But my toy corpus is just a few lines long. I tried it on an small English text and the same situation occurs.

To make the picture clearer, I will try to illustrate my experiment by listing what I have done.

My compiling context for CWB 3.4.3: CWB from svn: 3.4.3; PCRE: 7.4; glib-2.0; gcc: Ubuntu 4.2.4-1ubuntu3.

The compiling process seemed normal and I could build a tiny Chinese corpus using the following text (See the end of the post). Hopefully it can make it through the wild net to your computer remaining intelligible):

I then ran the following to index it:

ray at ray-laptop:~$ cwb-encode -c utf8 -d /home/ray/cqputf8 -f cqpweb_chinese_test_utf8.txt -R /usr/local/share/cwb/registry/test -P pos -S text -S s -S text_id

Annotations of s-attribute <text> not stored (file cqpweb_chinese_test_utf8.txt, line #1, warning issued only once).

ray at ray-laptop:~$ cwb-makeall -V TEST (everyting says OK)

ray at ray-laptop:~$ cwb-huffcode -A TEST (fine, nothing wrong)

ray at ray-laptop:~$ cwb-compress-rdx  -A TEST (fine again)

I queried the new corpus and nothing broken:

ray at ray-laptop:~$ cqp -eC

[no corpus]> TEST

TEST> "了";

        7: 们的行为也引来 <了> 不少公园游客的

       29:  ,他们早已习惯 <了> 。

TEST> <s> []* "了" []* </s>;  (query is OK)


Finally, I resorted to run  cwb-scan-corpus manually and did find something usual:

ray at ray-laptop:~$ cwb-scan-corpus -C TEST pos (fully OK)

ray at ray-laptop:~$ cwb-scan-corpus TEST pos+0 pos+1 (segmentation fault)

ray at ray-laptop:~$ cwb-scan-corpus TEST pos+0 pos+1 pos+2

Scanning corpus TEST for 3-tuples ...

Scan complete.                         

Printing frequency table on stdout ...

...

段错误 ("segmentation fault" in English)

I have very little knowledge in C so I cannot go further to investigate more.

Does anyone know where the problem is? Thanks for any input.

Best,
Ray

Hunan University of Commerce, China


PS: My computer parameters:

System: Ubuntu 8.04
Apache: 2.0.63
MySQL: 5.0.88
PHP: 5.2.12 (lower than expected 5.3.0)
Perl: 5.8.8
CWB: 3.4.3 (compiled from svn source)
Linux utilites: awk, tar, gzip, iconv

LANG=zh_CN.UTF-8

GDM_LANG=zh_CN.UTF-8

Inside cqpweb_chinese_test_utf8.txt:


<text id="test">
<s>
这些    r
网友    n
们    k
的    u
行为    n
也    d
引来    v
了    u
不少    m
公园    n
游客    n
的    u
围观    v
。    w
</s>
<s>
而    c
对于    p
人们    n
的    u
议论    v
,    w
这些    r
汉    t
服    v
爱好者    n
表示    v
,    w
他们    r
早已    d
习惯    v
了    y
。    w
</s>
</text>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20120526/e43bd300/attachment-0001.htm


More information about the CWB mailing list