[CWB] Impossible query

Ruprecht von Waldenfels ruprecht.waldenfels at gmx.net
Sat Feb 27 00:58:11 CET 2016


Dear all,
I just posted the following question on StackOverflow, see 
http://stackoverflow.com/questions/35663861/restrict-results-from-corpusworkbench-cwb-to-up-to-n-occurrences-of-an-attribu

Say, I have a corpus encoded in CWB with word, lemma, and aligned word 
information, such as in

     I     I     |Ich|
     told  tell  |habe|gesagt|
     them  they  |sie|
     to    to    ||
     leave leave |gehen|

Note that in the third column, alternate values are possible.

Now presuming I want a random sample of the occurrences of words with a 
lemma starting with "l", I would go:

     A=[lemma="l"]; reduce A to 1000; cat A;

This will give me a random sample with very different frequencies for 
each lemma; e.g., the lemma "leave" might be contained 20 times.

Here comes my problem: (a) what can I do if I want the random sample to 
contain a maximum number of 4 occurrences for each lemma? (b) what if I 
want the random sample to contain a maximum of 4 occurrences of any 
translation in column 3?

I suspect this is not possible in CWB, but I may be wrong; also, it may 
be possible using a combination of R and CWB.

I would greatly appreciate any help; I posted it on StackOverflow, 
because I thought this would be a better way to talk about for this kind 
of question, but actually, the community I am addressing is presumably 
on this list rather than on StackOverflow!

Best,
Ruprecht



More information about the CWB mailing list