[CWB] Impossible query

Hardie, Andrew a.hardie at lancaster.ac.uk
Sat Feb 27 06:18:27 CET 2016


>From what I know of Stack Overflow they might quite possibly reject this as off topic. Moreover, while I can't speak for Stefan or anyone else, of course, I personally have no intention of engaging with CWB-related questions anywhere but on this list. 

ANYWAY: I suspect the best way to accomplish what you want is by tabulating (or dumping) the query and running a script across the tabulation/dump that implements the sub-selection that you describe. Then undump, then cat.

best

Andrew.

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Ruprecht von Waldenfels
Sent: 26 February 2016 23:58
To: Open source development of the Corpus WorkBench
Subject: [CWB] Impossible query

Dear all,
I just posted the following question on StackOverflow, see 
http://stackoverflow.com/questions/35663861/restrict-results-from-corpusworkbench-cwb-to-up-to-n-occurrences-of-an-attribu

Say, I have a corpus encoded in CWB with word, lemma, and aligned word 
information, such as in

     I     I     |Ich|
     told  tell  |habe|gesagt|
     them  they  |sie|
     to    to    ||
     leave leave |gehen|

Note that in the third column, alternate values are possible.

Now presuming I want a random sample of the occurrences of words with a 
lemma starting with "l", I would go:

     A=[lemma="l"]; reduce A to 1000; cat A;

This will give me a random sample with very different frequencies for 
each lemma; e.g., the lemma "leave" might be contained 20 times.

Here comes my problem: (a) what can I do if I want the random sample to 
contain a maximum number of 4 occurrences for each lemma? (b) what if I 
want the random sample to contain a maximum of 4 occurrences of any 
translation in column 3?

I suspect this is not possible in CWB, but I may be wrong; also, it may 
be possible using a combination of R and CWB.

I would greatly appreciate any help; I posted it on StackOverflow, 
because I thought this would be a better way to talk about for this kind 
of question, but actually, the community I am addressing is presumably 
on this list rather than on StackOverflow!

Best,
Ruprecht

_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list