[CWB] A question on CQP attribute sets

Игорь Шалыминов ishalyminov at yandex-team.ru
Tue Jul 10 15:20:32 CEST 2012


Hello!

My name is Igor, I'm a developer of Russian National Corpus search engine, and I'm trying to get it working with CWB.
The main problem I have is the following: RNC texts are annotated ambiguously for the most part, and each word has got sets of lemmas, grammar and semantic features, just as the GERMAN-LAW example in the tutorial. Suppose we have a word:

word    lemma                   pos               agr                          sem
------------------------------------------------------------------------------------------------------------------------
form    |lemma1|lemma2|    |pos1|pos2|    |agr_set1|agr_set2|    |sem_set1|sem_set2|

And, if I type the query:

[(lemma contains "lemma1") and (pos contains "pos2")]

I will get that very word matched, and this will be a mistake in my case since there is only one strict correspondence: "lemma1 -> pos1 -> arg_set1 -> sem_set1", and the same for lemma2.

So, my question, is there an out of the box possibility of performing such queries (i.e., controlling positions of corresponding sets while matching attribute sets with 'contains'), or it has to be implemented?

--
Best Regards,
Igor Shalyminov


More information about the CWB mailing list