[CWB] CQP bug report?

Stefan Evert stefan.evert at uos.de
Thu Feb 26 15:35:49 CET 2009


Hi Eros and everybody,

that bug report doesn't sound too good. The one error message you got  
sounds very much like a memory leak (probably memory getting free()d  
too early, or free()d twice) or possibly a buffer overflow (since CQP  
uses fixed-size internal buffers in many places).

Is there any way for me to get access to the corpus (or, even better,  
the server on which it is installed), so I can test it with a debug  
build of CQP?  I still have a slight hope I can pinpoint the problem  
and fix it; otherwise we may have to use valgrind (which I haven't  
ever used before ...) to check for possible memory allocation  
problems. Dang!

It's quite likely that the error happens while scanning the lexicon  
file, i.e. before the query is actually executed on the corpus, and  
that it has to do with the implementation of %c/%d flags (which store  
the normalised string in a fixed-size buffer).  Could you quickly  
check the following to things on ITWAC or the problematic subset  
ITWAC_20, please?

	cwb-lexdecode -p cane -c -d ITWAC_20

and

	cwb-lexdecode ITWAC_20 | perl -nle '$l = length($_); $max = $l if $l  
 > $max; END{print $max}'

to check whether the problem may be due to an oversized string that  
isn't caught and trimmed by cwb-encode (or perhaps different hard  
limits in cwb-encode and CQP).

Thanks & best wishes,
Stefan

On 26 Feb 2009, at 12:16, Eros Zanchetta wrote:

> Hi everybody,
>
> I have a strange problem with cqp: whenever I try a %c or %d query on
> ITWAC (the full corpus) I get a segmentation fault.
>
> The behaviour is quite consistent, I tried the following:
>
> - I issued the query [word="cane" %c] on ITWAC, cqp immediately
> terminated with a "Segmentation fault" message
>
> - I issued the very same query [word="cane" %c] on UKWAC and DEWAC, it
> worked
>
> - I repeated the test on a different server (where the corpus had been
> re-indexed, it hadn't just been copied) and got the same results, the
> query completed successfully on UKWAC and DEWAC and resulted in a
> segmentation fault on ITWAC
>
> - I repeated the above steps using the %d flag (i.e. [word="cane"  
> %d]),
> and I got the same results
>
> - I repeated everything using both flags (i.e. [word="cane" %cd]),  
> same
> results
>
> As a last attempt I tried the query on the split version of ITWAC, the
> error manifested itself on ITWAC_20, I repeated the test on two more
> servers (with 32 bit versions of CQP) with the same results,
> segmentation fault on ITWAC_20. I suppose there's something in there  
> CQP
> doesn't like, I just can't figure out what it is.
>
> On only one occasion I got a backtrace (I attached it) of the error
> instead of the laconic "segmentation fault" message that normally  
> appears.
>
> I used CQP version 2.2.b97 (on Ubuntu 8.04 32 bit, kernel 2.6.24-23  
> and
> Fedora Core kernel 2.6.22.14) and 2.2.b99 (Ubuntu 8.04 64 bit, kernel
> 2.6.24-23)
>
> Did anyone experience anything like this or know how to fix this?
>
> Regards,
> Eros
> *** glibc detected *** cqp: malloc(): memory corruption:  
> 0x0000000000ad1920 ***
> ======= Backtrace: =========
> /lib/libc.so.6[0x7f9d57ccfa14]
> /lib/libc.so.6(__libc_malloc+0x90)[0x7f9d57cd1360]
> /lib/libc.so.6(vasprintf+0x3e)[0x7f9d57cc666e]
> /lib/libc.so.6(asprintf+0x88)[0x7f9d57ca9098]
> /lib/libc.so.6(__assert_fail+0xb8)[0x7f9d57c862a8]
> cqp[0x43b709]
> cqp[0x40c129]
> cqp[0x40cd58]
> cqp[0x40cdde]
> cqp[0x40d066]
> cqp[0x40de20]
> cqp[0x424677]
> cqp[0x4305c5]
> cqp[0x407e45]
> cqp[0x4073d0]
> cqp[0x4075d8]
> /lib/libc.so.6(__libc_start_main+0xf4)[0x7f9d57c791c4]
> cqp[0x407169]
> ======= Memory map: ========
> 00400000-00462000 r-xp 00000000 09:00  
> 89030766                           /usr/local/bin/cqp
> 00661000-00665000 rw-p 00061000 09:00  
> 89030766                           /usr/local/bin/cqp
> 00665000-00c16000 rw-p 00665000 00:00  
> 0                                  [heap]
> 7f95b0000000-7f95b0021000 rw-p 7f95b0000000 00:00 0
> 7f95b0021000-7f95b4000000 ---p 7f95b0021000 00:00 0
> 7f95b5126000-7f95b5133000 r-xp 00000000 09:00  
> 61145109                   /lib/libgcc_s.so.1
> 7f95b5133000-7f95b5333000 ---p 0000d000 09:00  
> 61145109                   /lib/libgcc_s.so.1
> 7f95b5333000-7f95b5334000 rw-p 0000d000 09:00  
> 61145109                   /lib/libgcc_s.so.1
> 7f95b5341000-7f95b8185000 r--p 00000000 09:00  
> 32771190                   /corpora/ukwac/word.corpus.rdx
> 7f95b8185000-7f97b37d4000 r--p 00000000 09:00  
> 32771191                   /corpora/ukwac/word.corpus.rev
> 7f97b37d4000-7f97b6618000 r--p 00000000 09:00  
> 32771189                   /corpora/ukwac/word.corpus.cnt
> 7f97b6618000-7f97c0086000 r--p 00000000 09:00  
> 32771174                   /corpora/ukwac/word.lexicon
> 7f97c0086000-7f97c2eca000 r--p 00000000 09:00  
> 32771175                   /corpora/ukwac/word.lexicon.idx
> 7f97c2eca000-7f99be519000 r--p 00000000 09:00  
> 32771173                   /corpora/ukwac/word.corpus
> 7f99be519000-7f99be583000 rw-p 7f99be519000 00:00 0
> 7f99be5b8000-7f99bfdac000 r--p 00000000 09:00  
> 17580050                   /corpora/itwac3/word.corpus.rdx
> 7f99bfdac000-7f9b87315000 r--p 00000000 09:00  
> 17580051                   /corpora/itwac3/word.corpus.rev
> 7f9b8734a000-7f9b88b3e000 r--p 00000000 09:00  
> 17580049                   /corpora/itwac3/word.corpus.cnt
> 7f9b88b3e000-7f9b8d506000 r--p 00000000 09:00  
> 17580035                   /corpora/itwac3/word.lexicon
> 7f9b8d506000-7f9b8ecfa000 r--p 00000000 09:00  
> 17580048                   /corpora/itwac3/word.lexicon.srt
> 7f9b8ecfa000-7f9b904ee000 r--p 00000000 09:00  
> 17580036                   /corpora/itwac3/word.lexicon.idx
> 7f9b904ee000-7f9d57a57000 r--p 00000000 09:00  
> 17580034                   /corpora/itwac3/word.corpus
> 7f9d57a57000-7f9d57a59000 r-xp 00000000 09:00  
> 61145405                   /lib/libdl-2.7.so
> 7f9d57a59000-7f9d57c59000 ---p 00002000 09:00  
> 61145405                   /lib/libdl-2.7.so
> 7f9d57c59000-7f9d57c5b000 rw-p 00002000 09:00  
> 61145405                   /lib/libdl-2.7.so
> 7f9d57c5b000-7f9d57db3000 r-xp 00000000 09:00  
> 61145402                   /lib/libc-2.7.so
> 7f9d57db3000-7f9d57fb3000 ---p 00158000 09:00  
> 61145402                   /lib/libc-2.7.so
> 7f9d57fb3000-7f9d57fb6000 r--p 00158000 09:00  
> 61145402                   /lib/libc-2.7.so
> 7f9d57fb6000-7f9d57fb8000 rw-p 0015b000 09:00  
> 61145402                   /lib/libc-2.7.so
> 7f9d57fb8000-7f9d57fbd000 rw-p 7f9d57fb8000 00:00 0
> 7f9d57fbd000-7f9d57ff4000 r-xp 00000000 09:00  
> 61145374                   /lib/libncurses.so.5.6
> 7f9d57ff4000-7f9d581f3000 ---p 00037000 09:00  
> 61145374                   /lib/libncurses.so.5.6
> 7f9d581f3000-7f9d581f8000 rw-p 00036000 09:00  
> 61145374                   /lib/libncurses.so.5.6
> 7f9d581f8000-7f9d58278000 r-xp 00000000 09:00  
> 61145406                   /lib/libm-2.7.so
> 7f9d58278000-7f9d58477000 ---p 00080000 09:00  
> 61145406                   /lib/libm-2.7.so
> 7f9d58477000-7f9d58479000 rw-p 0007f000 09:00  
> 61145406                   /lib/libm-2.7.so
> 7f9d58479000-7f9d58496000 r-xp 00000000 09:00  
> 61145399                   /lib/ld-2.7.so
> 7f9d58683000-7f9d58686000 rw-p 7f9d58683000 00:00 0
> 7f9d58693000-7f9d58696000 rw-p 7f9d58693000 00:00 0
> 7f9d58696000-7f9d58698000 rw-p 0001d000 09:00  
> 61145399                   /lib/ld-2.7.so
> 7fff60682000-7fff60697000 rw-p 7ffffffea000 00:00  
> 0                      [stack]
> 7fff607fe000-7fff60800000 r-xp 7fff607fe000 00:00  
> 0                      [vdso]
> ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00  
> 0                  [vsyscall]
> Aborted
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb



More information about the CWB mailing list