[CWB] What's possible via the CQPweb interface

Josep M. Fontana josepm.fontana at upf.edu
Tue Apr 28 14:23:05 CEST 2015


Great! Thanks Andrew. A simple change in the settings making lemmas the 
primary annotation easily gets me where I want to go. That's all I need 
and the whole solution is rather practical and easy to implement. Thanks 
again for your help, guys.

JM
> >>  But what I really wanted to get was the frequencies for the lemmas not for the forms. I don't see any way of getting that.
>
> not at present. Frequency breakdown works only with the words and the 
> primary annotation. It does not allow access to the secondary 
> annotation. It sounds like you have the lemma set as your 2ndary 
> annotation. FO course, you could always set the lemma to primary and 
> POS to secondary, then use
>
> _haber {V*}
>
> as your query.
>
> I recall Stefan, I think, wanted me to make FB (and Sort, on which it 
> is based) allow access to arbitrary p-attributes. This would take up 
> rather a lot fo disk space, unless done selectively, as with 
> collocation. So I’ve not done it yet.
>
> >> I tried to use the 'frequency list' option from the right column menu 
> and selected 'lemma' in 'view a list based on ...' but this selection 
> doesn't stay put when I go to the screen where I can introduce the 
> query before the frequency breakdown
>
> That’s because the “frequency list” system does what it says on the 
> tin – gives you a list of items in the corpus, plus their frequencies. 
> It does not act to add constrinsts on any other part of the system.
>
> >> MU queries are "more or less" undocumented
>
> They are documented in some old versions of the CQP syntax tutorial, 
> but in a way that may not be accurate and cannot be relied on (thus 
> why Stefan says their semantics is deprecated – the documentation 
> purposefully leaves out MU queries because their behaviour cannot be 
> rigorously described). Otherwise, the best documentation is the code. 
> They will be properly documented once we’ve got their semantics to 
> behave properly, something which is on the list for  CWBv4.
>
> best
>
> Andrew.
>
> *From:*cwb-bounces at sslmit.unibo.it 
> [mailto:cwb-bounces at sslmit.unibo.it] *On Behalf Of *Josep M. Fontana
> *Sent:* 28 April 2015 12:12
> *To:* cwb at sslmit.unibo.it
> *Subject:* Re: [CWB] What's possible via the CQPweb interface
>
> Thanks a lot Stefan and Andrew. This will come in very handy.
>
> I have a little question about the query Stefan suggests. Suppose I 
> want to find out the frequencies of lemmas for the non-finite forms of 
> the verb that appear after the auxiliary 'haber' in the following query:
>
> {haber} _V*
>
> If I do:
>
>
> _V* <<1<< {haber}
>   
> this returns, as you said, the final token of the match. If I then apply the frequency breakdown, I can get the frequencies for the forms of _V* and that is already much better than what I could do before. But what I really wanted to get was the frequencies for the lemmas not for the forms. I don't see any way of getting that.
>   
> I tried to use the 'frequency list' option from the right column menu and selected 'lemma' in 'view a list based on ...' but this selection doesn't stay put when I go to the screen where I can introduce the query before the frequency breakdown.
>
> You say MU queries are "more or less" undocumented which I take it to 
> mean that they are somewhat documented. Is there any chance I can get 
> my hands on some of these, incomplete as they may be, documents? I 
> don't know whether such documents would provide answers to questions 
> like the one I just asked but I could use this information to write a 
> little mini-guide so that our students using CQPWeb know about the 
> additional possibilities they have.
>
>
>
> JM
>
>     What I was trying to obtain was the frequency of lemmas that are instantiations of 'POS Y' in a search string of the form 'lemma X POS Y'. In the command line I would have used:
>
>       
>
>     $ count Last by lemma %cd on matchend;
>
>   
> In this simple case, you can get away with a trick that depends on the more or less undocumented MU queries and their deprecated semantics …
>   
> If your original query is
>   
>          [lemma = "LEM"] [pos = "POS"];
>   
> the MU expression
>   
>          MU(meet [pos="POS"] [lemma="LEM"] -1 -1)
>   
> returns just the final token of each match, and you can apply frequency breakdown to do the counts.
>   
> This is much nicer in CEQL query syntax:
>   
>          _POS <<1<< {LEM}
>   
> Unfortunately, this approach doesn't generalize if you want to count sequences of multiple words or if the original query contains repetition operators.
>   
> Cheers,
> Stefan
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it  <mailto:CWB at sslmit.unibo.it>
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>
>
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20150428/6a4805d1/attachment.html>


More information about the CWB mailing list