[CWB] [ cwb-Bugs-1549254 ] CQP crashes on long kwic output lines

SourceForge.net noreply at sourceforge.net
Mon Aug 1 01:09:41 CEST 2011


Bugs item #1549254, was opened at 2006-08-30 12:51
Message generated for change (Settings changed) made by andrewhardie
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=722303&aid=1549254&group_id=131809

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: CQP interface
>Group: TODO-3.5
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: Stefan Evert (schtepf)
Assigned to: Stefan Evert (schtepf)
Summary: CQP crashes on long kwic output lines

Initial Comment:
When kwic output lines (generated by the "cat" command
in CQP) get too long, CQP will crash suddenly by
segmentation fault.  This happens typically when (i)
many positional and/or structural attributes with long
values are printed, (ii) context is set to sentence and
the corpus contains very long sentences (often due to
errors in the markup), or (iii) context is set to large
text regions such as paragraphs or entire documents (or
matches are expanded to such regions).  

The reason for the crash is a simple buffer overflow,
since the kwic formatting routines (in
<cqp/concordance.c>) use a fixed buffer for compiling
the output lines.  The size of this buffer is hardcoded
in <cqp/concordance.c> (MAXKWICLINELEN constant) and is
currently set to 32768 characters.

----------------------------------------------------------------------

Comment By: Stefan Evert (schtepf)
Date: 2006-08-30 12:57

Message:
Logged In: YES 
user_id=545257

Actually, the formatting code already checks for buffer
overflow, simply cutting off the output after
MAXKWICELINELEN bytes.  I believe that it just forgets to
terminate the truncated string with a NUL character so that
the C standard library crashes when it tries to print the
string.  Needs some more thorough investigation, though.

While it would be relatively easy to patch up the problem
for now (make sure that output string is always
NUL-terminated, increase buffer size to handle all commonly
encountered situations), a fundamental redesign of the kwic
formatting code is direly needed and I would prefer to keep
this bug on hold till then.



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=722303&aid=1549254&group_id=131809


More information about the CWB mailing list