[CWB] Does CQPweb support dynamic attributes now?

Hardie, Andrew a.hardie at lancaster.ac.uk
Tue Oct 6 11:38:43 CEST 2015


>> By the "&"operation  in the query [pos="N.*" & f(word)>100], it is supposed that both conditions on the right and left are to be satisfied. That is, the result should only return any word that is BOTH noun AND with frequency greater than 100 AS NOUN.

This is a misunderstanding of the logic. Yes, the & requires that both sides of the & be satisfied. But they do not affect one another.

So the right side is not “AND with frequency greater than 100 AS NOUN”. There is nothing on the right hand side that says “AS NOUN”. f(word) just refers to the frequency of that word form in the lexicon. The referent of the symbol “word” is not altered by anything on the left hand side.

More broadly, if a wordform  occurs 101 times in your corpus, f(word) will always evaluate to 101, regardless of what else is going on elsewhere in a given CQP query. The f() function pulls information form the lexicon, it does not perform a count on the fly.

If you want to test the frequency of word-and-pos combinations, then you would have to set up a separate word-and-pos combination column, and apply the f() function to that.

So, in short, it looks like you have the correct behaviour on this front.

best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of (Ray) WU Liangping
Sent: 06 October 2015 10:05
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] Does CQPweb support dynamic attributes now?

>> Well, for illustration of "not really executed", let's take the query [pos="N.*" & f(word)>100] on the Brown corpus for example, which returns 90,211 matches. In comparison, resetting 100 to 1000 in the previous query returns 7,205 matches (all experimented with BFSU CQPweb). This fact shows that the f() function does work. However, further "Frequency breakdown" reveals that even words with a single occurrence are in the final result set, an evidence that the f() function is not really respected (at least in some operations within CQPweb).

>

>Those are probably word forms that occur more than 100 times in the corpus, but aren't always tagged as nouns.  When I try your query on the brown family, I find words like

>

>       perfect

>       unemployed

>

>at the end of the frequency ranking, which are infrequently used as nouns.



hi Stefan,



Although I do not find "perfect" and "unemployed" in the result set (I used just Brown instead of the whole Brown family), I do find the phenomenon you mentioned. For instance, the last word in the list by "Frequency breakdown" is "over", which was incorrectly tagged as NN1 (once only) and its frequency in  the whole corpus is 1,234 (case ignored). That might explain its appearance in the result set.



But there comes another question - By the "&"operation  in the query [pos="N.*" & f(word)>100], it is supposed that both conditions on the right and left are to be satisfied. That is, the result should only return any word that is BOTH noun AND with frequency greater than 100 AS NOUN. But now the query seems to first check whether the word is a noun, then check its frequency in the whole corpus (instead of its frequency as noun). This creates some confusion in me, as I understand that the "&" operation is non-directional (but now it is carried out from left to right, one way only). Have I missed anything here?



Since the dynamic attribute is slow, I can now understand its removal. Thanks for your suggestion for implementing the semantic restriction as a p-attribute.



Best,

Liangping
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20151006/efa41be7/attachment.html>


More information about the CWB mailing list