[CWB] Make 'cut' treat ranges like 'cat'?

Stefan Evert stefanML at collocations.de
Mon Nov 13 12:45:49 CET 2017


> On 12 Nov 2017, at 18:15, Martin Hammarstedt <martin.hammarstedt at gu.se> wrote:
> 
> > Oh, I wasn't aware that this variant of the cut command exists at all.  Where did you find it?
> 
> It's mentioned on page 19 in the CQP tutorial (http://cwb.sourceforge.net/files/CQP_Tutorial.pdf).

I wonder who put that into the tutorial … ;-)

> As far as I can tell the current implementation of negative indices is broken:
> 
>   if (first < 0) first = n_matches - first;
>   if (last < 0) last = n_matches - last;
> 
> Since first/last are negative numbers, the subtractions above result in additions, so index -3 in a query with 10 results actually translates to index 13, leading to an error.

Indeed. When I scanned the implementation, I misread those lines for what they were clearly intended to do rather than what they actually do,

> I guess that means you can safely assume no one has been using the negative indices.

Agreed.  So given that (i) negative indices are not documented in the tutorial, (ii) they don't work anyway and (iii) I don't see a problem with Martin's request to automatically clamp the cut range to the query size, I'm going to change the implementation so that:

 - only non-negative indices are allowed
 - if the range exceeds the query size, it is automatically clamped, without a warning (same as with "cat")
 - if the selected range is empty (start < end), an empty result is returned _with_ a warning
 - if the query result is empty, it remains unchanged but a warning is issued

The latter two are intended to catch user errors because it doesn't seem sensible to make a cut that discards all hits from a query result, not to attempt to cut an empty result.  Users and applications should check the query result size before attempting to cut.

Update should appear in the SVN repository within the next few hours.

Best,
Stefan


More information about the CWB mailing list