[CWB] Tiger and CWB

Maarten Janssen maartenpt at gmail.com
Tue Jun 6 11:14:09 CEST 2017


For a corpus in CWB that included dependency relations, I am trying to implement some sort of cross between CQP and TIGER search, where you can search for dependencies, using a Gorn number on every token in the sentence (anybody tried something like that before already?) The base idea (to be expanded upon) would be to allow expressions of the sort (a “full" NP dominated by the verb “give”):

[pos=“D.*"] [pos=“A.*"]* a:[pos=“N.*"] and b:[lemma=“give"] and b > a

Rather than trying to implement something low-level (CWB::CL says it is to be discontinued, and with CWB4 using a different file system, reading the files directly would also not be very useful at this point - or am I missing something there?), I was trying to use “pure" CQL for this, but all my attempts run into limitations in CQL. The first one is to use intersections like this:

Matches = [pos=“D.*"] [pos=“A.*"]* @[pos=“N.*”];
set Matches target match;
A = Matches expand to s;
Matches = @[lemma=“give”];
set Matches target match;
B = Matches expand to s;
Results = intersection(A,B);

This does give you nicely the sentences containing both of the first two clauses. But in order to then check for the dependency, we need to know what the target was in both A and B - and intersection seem to only give the target of A and discard the target of B. Is there any way to check for that in CQL? 

If that is not possible, another approach would be to keep it order-based, and create a single complex CQL, like the one below (which would hence only yield cases where the NP precedes the verb - but you could run a second search with the reverse order - although that is bound to be restrictive in the end):

[pos=“D.*"] [pos=“A.*"]* a:[pos=“N.*"] []* b:[lemma=“give” & a.gorn=b.gorn.”.+"] 

But again this does not seem possible, since there does not seem to be any function in CQL that allows the last clause in this - is there any way to do string concatenation or “contains” like this in CQL?


More information about the CWB mailing list