[CWB] Tiger and CWB
Stefan Evert
stefanML at collocations.de
Tue Jun 6 11:41:52 CEST 2017
Hi Maarten,
three brief comments:
1) The CQP query language really isn't designed for syntactic queries – that's one of the reasons for the new data model in CWB4. So I suspect that you'll keep running into limitations as soon as you move beyond simple queries.
The best way of introducing a little bit of syntactic structure is to encode dependency links as p-attributes, including relevant annotation of the "other end" of the link. E.g. you might add attributes
head_rel head_word head_lemma dep_rel dep_word dep_lemma
where the first three encode a dependency link in which the current token is the dependent (and head_word, head_lemma are word form and lemma of the corresponding head), and the other three encode a dependency link in which the current token is the head.
This would allow you to find a full NP as an object of "give" quite easily:
[pos=“D.*"] [pos=“A.*"]* [pos=“N.*" & head_rel = "dobj" & head_lemma = "give"];
2) If you have a recently up-to-date version of CQP, you can use the experimental is_prefix() function:
> [pos=“D.*"] [pos=“A.*"]* a:[pos=“N.*"] []* b:[lemma=“give” & a.gorn=b.gorn.”.+"]
>
> But again this does not seem possible, since there does not seem to be any function in CQL that allows the last clause in this - is there any way to do string concatenation or “contains” like this in CQL?
[pos=“D.*"] [pos=“A.*"]* a:[pos=“N.*"] []* b:[lemma=“give” & is_prefix(b.gorn, a.gorn)];
Documentation of all built-in functions can be found in the SVN version of the CQP Query Tutorial (we'll push an update to the Web site once 3.5 is ready for official release).
https://sourceforge.net/p/cwb/code/HEAD/tree/doc/tutorials/CQP_Tutorial.pdf?format=raw
3) Have you had a look at the CWB-treebank project? It seems to do exactly what you're aiming for, except that it is targeted at dependency graphs rather than phrase structure.
http://www.lrec-conf.org/proceedings/lrec2012/summaries/709.html
https://launchpad.net/cwb-treebank
Best,
Stefan
More information about the CWB
mailing list