[CWB] [ cwb-Feature Requests-3058717 ] cl_string_canonical: risk of buffer overflow

SourceForge.net noreply at sourceforge.net
Fri Sep 3 13:00:42 CEST 2010


Feature Requests item #3058717, was opened at 2010-09-03 11:00
Message generated for change (Tracker Item Submitted) made by andrewhardie
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=722306&aid=3058717&group_id=131809

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Priority: 5
Private: No
Submitted By: Andrew Hardie (andrewhardie)
Assigned to: Nobody/Anonymous (nobody)
Summary: cl_string_canonical: risk of buffer overflow

Initial Comment:
cl_string_canonical currently modifies strings in situ. It will be more convenient for it to always return a newly allocated string unless specifically instructed.

char * 
cl_string_canonical(char *s, CorpusCharset charset, int flags, size_t inplace_bufsize)

If inplace_bufsize == 0 (or negative), a newly allocated string is returned.

If inplace_bufsize > 0, s is modified in-place up to a maximum size of inplace_bufsize-1 characters (plus NUL terminator).  If the normalised string doesn't fit into the buffer, the extra characters are dropped silently.  For UTF-8 strings, the result allocated by Glib is copied to s (dropping characters that don't fit) and then free'd, as in the current implementation.

This will break backwards compartibiltiy of the CL.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=722306&aid=3058717&group_id=131809


More information about the CWB mailing list