[CWB] What's possible via the CQPweb interface

Hardie, Andrew a.hardie at lancaster.ac.uk
Tue Apr 28 13:20:16 CEST 2015


>> But what I really wanted to get was the frequencies for the lemmas not for the forms. I don't see any way of getting that.

not at present. Frequency breakdown works only with the words and the primary annotation. It does not allow access to the secondary annotation. It sounds like you have the lemma set as your 2ndary annotation. FO course, you could always set the lemma to primary and POS to secondary, then use

_haber {V*}

as your query.

I recall Stefan, I think, wanted me to make FB (and Sort, on which it is based) allow access to arbitrary p-attributes. This would take up rather a lot fo disk space, unless done selectively, as with collocation. So I’ve not done it yet.

>> I tried to use the 'frequency list' option from the right column menu and selected 'lemma' in 'view a list based on ...' but this selection doesn't stay put when I go to the screen where I can introduce the query before the frequency breakdown

That’s because the “frequency list” system does what it says on the tin – gives you a list of items in the corpus, plus their frequencies. It does not act to add constrinsts on any other part of the system.

>> MU queries are "more or less" undocumented

They are documented in some old versions of the CQP syntax tutorial, but in a way that may not be accurate and cannot be relied on (thus why Stefan says their semantics is deprecated – the documentation purposefully leaves out MU queries because their behaviour cannot be rigorously described). Otherwise, the best documentation is the code. They will be properly documented once we’ve got their semantics to behave properly, something which is on the list for  CWBv4.

best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Josep M. Fontana
Sent: 28 April 2015 12:12
To: cwb at sslmit.unibo.it
Subject: Re: [CWB] What's possible via the CQPweb interface

Thanks a lot Stefan and Andrew. This will come in very handy.

I have a little question about the query Stefan suggests. Suppose I want to find out the frequencies of lemmas for the non-finite forms of the verb that appear after the auxiliary 'haber' in the following query:

{haber} _V*

If I do:



_V* <<1<< {haber}



this returns, as you said, the final token of the match. If I then apply the frequency breakdown, I can get the frequencies for the forms of _V* and that is already much better than what I could do before. But what I really wanted to get was the frequencies for the lemmas not for the forms. I don't see any way of getting that.



I tried to use the 'frequency list' option from the right column menu and selected 'lemma' in 'view a list based on ...' but this selection doesn't stay put when I go to the screen where I can introduce the query before the frequency breakdown.
You say MU queries are "more or less" undocumented which I take it to mean that they are somewhat documented. Is there any chance I can get my hands on some of these, incomplete as they may be, documents? I don't know whether such documents would provide answers to questions like the one I just asked but I could use this information to write a little mini-guide so that our students using CQPWeb know about the additional possibilities they have.



JM


What I was trying to obtain was the frequency of lemmas that are instantiations of 'POS Y' in a search string of the form 'lemma X POS Y'. In the command line I would have used:



$ count Last by lemma %cd on matchend;



In this simple case, you can get away with a trick that depends on the more or less undocumented MU queries and their deprecated semantics …



If your original query is



        [lemma = "LEM"] [pos = "POS"];



the MU expression



        MU(meet [pos="POS"] [lemma="LEM"] -1 -1)



returns just the final token of each match, and you can apply frequency breakdown to do the counts.



This is much nicer in CEQL query syntax:



        _POS <<1<< {LEM}



Unfortunately, this approach doesn't generalize if you want to count sequences of multiple words or if the original query contains repetition operators.



Cheers,

Stefan

_______________________________________________

CWB mailing list

CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>

http://devel.sslmit.unibo.it/mailman/listinfo/cwb

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20150428/e97111d9/attachment-0001.html>


More information about the CWB mailing list