[CWB] "Segmentation fault (core dumped)" on various versions

Wed Jul 24 21:48:41 CEST 2013

On Wed, Jul 24, 2013 at 2:43 AM, Stefan Evert <stefanML at collocations.de>wrote:

Dear Stefan,

Thanks so much for your help. The following seems to have fixed the problem:

If you have "cwb-make" from the CWB/Perl modules, you can simply trash the
> ".crc" and ".crx" files (which contain the actual lookup index that appears
> to be damaged) and rebuild them with
>         cwb-make [...] PERS-DIVER-USENET

More testing will be needed to be sure, of course.

Best wishes,
Scott

> On 24 Jul 2013, at 04:30, Scott Sadowsky <ssadowsky at gmail.com> wrote:
>
> > Something very strange is going on. I've replaced my index for this
> corpus with a third backup copy, and the following happened:
> >
> > PERS-DIVER-USENET> "jai"
> > 0 matches.
> > PERS-DIVER-USENET> ".+ai"
> > Segmentation fault (core dumped)
> > Here the search for "jai", which previously caused a segfault, worked.
> So all seemed good. But the search returned 0 hits, instead of the 1 which
> is returned by the command cwb-lexdecode -f -p '.ai' PERS-DIVER-USENET. So
> something isn't adding up here.
>
> If this is indeed a buffer overflow or so triggered by a faulty index
> file, it is not surprising that there's somewhat erratic behaviour.
>
> > I suspect the next step is to rebuild the index from scratch, but that
> involves decompressing a ZIP file with 1.2 million files inside it, which
> I'd rather avoid if at all possible.
>
>
>
> Of course, make sure you have a backup copy of the corpus beforehand.
>
> You should also be able to rebuild the index files manually with
> "cwb-makeall" and "cwb-compress-rdx", but those tools sometimes get
> confused about which files need to be rebuilt in which order.
>
>
> If you need to try re-encoding from scratch, an easier solution is
>
>         cwb-decode -Cx PERS-DIVER-USENET -ALL | cwb-encode -x [...]
> <appropriate declarations>
>
> Note that the attribute declarations in the cwb-encode command will be
> different from the ones you used for the original encoding, because
> attributes on XML regions are not decoded in proper XML notation.
>
>
> Hope that one of these steps helps!
> Stefan
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20130724/17314bec/attachment.html>