[CWB] Impossible query

Ruprecht von Waldenfels ruprecht.waldenfels at gmx.net
Sat Feb 27 06:52:10 CET 2016


Hi,
well, there are a few topics concerning CWB on StackOverflow. The idea 
was to post here and there, so anybody who wants to answer can choose 
their forum (as you did).

Thanks for your answer! I was kind of suspecting there is some macro 
magic that can be used to accomplish this, but if it's scripts, it's 
scripts. Seems reasonable.

Best,
Ruprecht



Am 26.02.2016 um 21:18 schrieb Hardie, Andrew:
>  From what I know of Stack Overflow they might quite possibly reject this as off topic. Moreover, while I can't speak for Stefan or anyone else, of course, I personally have no intention of engaging with CWB-related questions anywhere but on this list.
>
> ANYWAY: I suspect the best way to accomplish what you want is by tabulating (or dumping) the query and running a script across the tabulation/dump that implements the sub-selection that you describe. Then undump, then cat.
>
> best
>
> Andrew.
>
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Ruprecht von Waldenfels
> Sent: 26 February 2016 23:58
> To: Open source development of the Corpus WorkBench
> Subject: [CWB] Impossible query
>
> Dear all,
> I just posted the following question on StackOverflow, see
> http://stackoverflow.com/questions/35663861/restrict-results-from-corpusworkbench-cwb-to-up-to-n-occurrences-of-an-attribu
>
> Say, I have a corpus encoded in CWB with word, lemma, and aligned word
> information, such as in
>
>       I     I     |Ich|
>       told  tell  |habe|gesagt|
>       them  they  |sie|
>       to    to    ||
>       leave leave |gehen|
>
> Note that in the third column, alternate values are possible.
>
> Now presuming I want a random sample of the occurrences of words with a
> lemma starting with "l", I would go:
>
>       A=[lemma="l"]; reduce A to 1000; cat A;
>
> This will give me a random sample with very different frequencies for
> each lemma; e.g., the lemma "leave" might be contained 20 times.
>
> Here comes my problem: (a) what can I do if I want the random sample to
> contain a maximum number of 4 occurrences for each lemma? (b) what if I
> want the random sample to contain a maximum of 4 occurrences of any
> translation in column 3?
>
> I suspect this is not possible in CWB, but I may be wrong; also, it may
> be possible using a combination of R and CWB.
>
> I would greatly appreciate any help; I posted it on StackOverflow,
> because I thought this would be a better way to talk about for this kind
> of question, but actually, the community I am addressing is presumably
> on this list rather than on StackOverflow!
>
> Best,
> Ruprecht
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb



More information about the CWB mailing list