[CWB] What's possible via the CQPweb interface

Josep M. Fontana josepm.fontana at upf.edu
Tue Apr 28 13:11:55 CEST 2015


Thanks a lot Stefan and Andrew. This will come in very handy.

I have a little question about the query Stefan suggests. Suppose I want 
to find out the frequencies of lemmas for the non-finite forms of the 
verb that appear after the auxiliary 'haber' in the following query:

{haber} _V*

If I do:

_V* <<1<< {haber}

this returns, as you said, the final token of the match. If I then apply the frequency breakdown, I can get the frequencies for the forms of _V* and that is already much better than what I could do before. But what I really wanted to get was the frequencies for the lemmas not for the forms. I don't see any way of getting that.

I tried to use the 'frequency list' option from the right column menu and selected 'lemma' in 'view a list based on ...' but this selection doesn't stay put when I go to the screen where I can introduce the query before the frequency breakdown.

You say MU queries are "more or less" undocumented which I take it to 
mean that they are somewhat documented. Is there any chance I can get my 
hands on some of these, incomplete as they may be, documents? I don't 
know whether such documents would provide answers to questions like the 
one I just asked but I could use this information to write a little 
mini-guide so that our students using CQPWeb know about the additional 
possibilities they have.



JM
>> What I was trying to obtain was the frequency of lemmas that are instantiations of 'POS Y' in a search string of the form 'lemma X POS Y'. In the command line I would have used:
>>
>> $ count Last by lemma %cd on matchend;
> In this simple case, you can get away with a trick that depends on the more or less undocumented MU queries and their deprecated semantics …
>
> If your original query is
>
> 	[lemma = "LEM"] [pos = "POS"];
>
> the MU expression
>
> 	MU(meet [pos="POS"] [lemma="LEM"] -1 -1)
>
> returns just the final token of each match, and you can apply frequency breakdown to do the counts.
>
> This is much nicer in CEQL query syntax:
>
> 	_POS <<1<< {LEM}
>
> Unfortunately, this approach doesn't generalize if you want to count sequences of multiple words or if the original query contains repetition operators.
>
> Cheers,
> Stefan
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20150428/15b49690/attachment.html>


More information about the CWB mailing list