[CWB] Impossible query
Ruprecht von Waldenfels
ruprecht.waldenfels at gmx.net
Sat Feb 27 00:58:11 CET 2016
Dear all,
I just posted the following question on StackOverflow, see
http://stackoverflow.com/questions/35663861/restrict-results-from-corpusworkbench-cwb-to-up-to-n-occurrences-of-an-attribu
Say, I have a corpus encoded in CWB with word, lemma, and aligned word
information, such as in
I I |Ich|
told tell |habe|gesagt|
them they |sie|
to to ||
leave leave |gehen|
Note that in the third column, alternate values are possible.
Now presuming I want a random sample of the occurrences of words with a
lemma starting with "l", I would go:
A=[lemma="l"]; reduce A to 1000; cat A;
This will give me a random sample with very different frequencies for
each lemma; e.g., the lemma "leave" might be contained 20 times.
Here comes my problem: (a) what can I do if I want the random sample to
contain a maximum number of 4 occurrences for each lemma? (b) what if I
want the random sample to contain a maximum of 4 occurrences of any
translation in column 3?
I suspect this is not possible in CWB, but I may be wrong; also, it may
be possible using a combination of R and CWB.
I would greatly appreciate any help; I posted it on StackOverflow,
because I thought this would be a better way to talk about for this kind
of question, but actually, the community I am addressing is presumably
on this list rather than on StackOverflow!
Best,
Ruprecht
More information about the CWB
mailing list