[CWB] Tiger and CWB

Stefan Evert stefanML at collocations.de
Tue Jun 6 11:41:52 CEST 2017


Hi Maarten,

three brief comments:

1) The CQP query language really isn't designed for syntactic queries – that's one of the reasons for the new data model in CWB4.  So I suspect that you'll keep running into limitations as soon as you move beyond simple queries.

The best way of introducing a little bit of syntactic structure is to encode dependency links as p-attributes, including relevant annotation of the "other end" of the link. E.g. you might add attributes

	head_rel	head_word	head_lemma	dep_rel	dep_word	dep_lemma

where the first three encode a dependency link in which the current token is the dependent (and head_word, head_lemma are word form and lemma of the corresponding head), and the other three encode a dependency link in which the current token is the head.

This would allow you to find a full NP as an object of "give" quite easily:

	[pos=“D.*"] [pos=“A.*"]* [pos=“N.*" & head_rel = "dobj" & head_lemma = "give"];


2) If you have a recently up-to-date version of CQP, you can use the experimental is_prefix() function:

> [pos=“D.*"] [pos=“A.*"]* a:[pos=“N.*"] []* b:[lemma=“give” & a.gorn=b.gorn.”.+"] 
> 
> But again this does not seem possible, since there does not seem to be any function in CQL that allows the last clause in this - is there any way to do string concatenation or “contains” like this in CQL?

	[pos=“D.*"] [pos=“A.*"]* a:[pos=“N.*"] []* b:[lemma=“give” & is_prefix(b.gorn, a.gorn)];

Documentation of all built-in functions can be found in the SVN version of the CQP Query Tutorial (we'll push an update to the Web site once 3.5 is ready for official release).

	https://sourceforge.net/p/cwb/code/HEAD/tree/doc/tutorials/CQP_Tutorial.pdf?format=raw


3) Have you had a look at the CWB-treebank project?  It seems to do exactly what you're aiming for, except that it is targeted at dependency graphs rather than phrase structure.

	http://www.lrec-conf.org/proceedings/lrec2012/summaries/709.html
	https://launchpad.net/cwb-treebank


Best,
Stefan



More information about the CWB mailing list