[CWB] Zero width assertion

Stefan Evert stefanML at collocations.de
Wed Feb 8 19:00:03 CET 2012


> There is an undocumented option on CQP language that allows us to say:
> 
>    [ lema = "eat" ] [: lema = "pig" :]
> 
> meaning that the second word should not be "captured".
> 
> Unfortunately it seems that:
> - it needs to be the last word
> - we can't use more than one of these
> 
> The questions are:
> - is this really not supported, or am I doing anything wrong?


It's not supported.  These zero width assertions were implemented for an entirely different purpose, and it just so happens that they can be used to test one extra token at the end of the match, which is a convenient trick in some cases.

In hindsight, I shouldn't have implemented zero width assertions in the first place -- they've made it practically impossible to optimise CQP queries in a general way.

> - would it be possible to implement it?

As Radio Yerevan would put it: in principle, it can be implemented, but no-one's going to want to do that. :-)

The problem is that the implementation would have to be entirely different from the current implementation of zero width assertions, and would require fairly extensive changes to the query evaluation code (I believe).

> that is, these doesn't work:
>   [: lema = "eat" :] [ lema = "pig" ]
>   [ lema = "eat" ] [: lema = "the" :] [: lema = "pig :]

How about the following work-around (unless you want to run those queries in a Web interface and have to squeeze everything into a single CQP query):

	Pork = ([lemma="have"] @"eaten" | @[lemma="eat" & pos="VV.*"]) "the" [lemma="pig"];
	set Pork matchend target;
	cat Pork;

Note the hoops we have to jump through in order to set the target marker correctly, but this is something that could be fixed relatively easily.

Hope this explains & helps,
Stefan



More information about the CWB mailing list