[CWB] CQPWeb -- Negation operator in simple query language

Stefan Evert stefanML at COLLOCATIONS.DE
Wed Oct 31 20:05:00 CET 2012


> I see your points. My current problem was I wanted to search for sequences of finite verbs +  gerund, but make sure that estar was not included as finite verb form (by far the most frequent in my corpus).
> 
> But this was just an example, it was not derived from real needs. So basically I'm good. The customization options you gave are good to know and that's enough for me right now.

Keep in mind that you can always make your own modifications if you're running your own CQPweb server.  For your particular need, you will have to modify the file "lib/perl/cqpwebCEQL.pm", which holds customizations of the CEQL query syntax.

The patch appended to this message implements negated conditions in CEQL queries in the following way:

   - word form pattern and POS tag are negated _separately_ by placing a "!" at the start of the corresponding expression (note that this breaks backward compatibility but is unlikely to occur in an actual query)

   - for lemma and simplified POS tag, the "!" has to be placed just inside the curly braces

   - for combo searches like {light/V}, negation applies to the full combination

   - if a pattern consists only of a single "!", this is not interpreted as a negation metacharacter, but as a literal exclamation mark; this circumvents the most likely case of incompatibility and appropriately turns the query "!!" into a WTF situation if you happen to run it on a suitably large corpus ;-)

I guess a few examples would be helpful ...

	light_!N*    # light when _not_ used as a noun
	light_{!N}   # same with simple POS tag
	{![be,have]}_V*  # any verb except for BE and HAVE
	{!be}_{!N}  # any token that is neither a form of BE nor a noun; such double negations will be rarely useful
	!                # literal "!"

Note the difference between

	{!light}_{!N}   # neither a form of LIGHT nor a noun (excludes all nouns and all forms of LIGHT)

and

	{!light/N}       # everything except LIGHT used as a noun (does not exclude verb/adjective uses of LIGHT, nor all the other nouns)

BTW, if you think this is useful and won't bite too many existing users, we _could_ still include it in the standard CEQL dialect.  I admit I had a couple of cases similar to Marti's query where I would very much have liked this extension ...

Share and enjoy!
Stefan


-------------- next part --------------
A non-text attachment was scrubbed...
Name: cqpweb_ceql_negation.patch
Type: application/octet-stream
Size: 3505 bytes
Desc: not available
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121031/105eeb25/attachment.obj>


More information about the CWB mailing list