[CWB] Relative frequencies in CWB

Sun Feb 24 11:38:50 CET 2019

> On 20 Feb 2019, at 12:53, Meier-Vieracker, Simon <simon.meier at tu-berlin.de> wrote:
> 
> My corpus has the attribute text_year, so I can get the absolute frequencies with:
> 
>> A = "word";
>> group A match text_year;
> 
> I can get the subcorpus size with
> 
>> B = []:
>> group B match text_year;
> 
> and set it off against the results of "group A". But is there a way to do this in one step?

That is not possible because CQP doesn't aim to be a general-purpose programming language or corpus analysis tool.  It provides only query-centered functionality with fairly straightforward implementation.  More specialized operations (such as adding 0 frequency counts for texts with no matches) are much more easily implemented in a high-level programming language.

I have a Perl script for such purposes (specifically for extracting text-level quantitative features to be used in multivariate analysis), which makes it easy to match up frequency counts by text ID.  It does not compute relative frequencies directly, though, because that is more easily and flexibly done in R (or Pandas, if you prefer).

I will send the script and some instructions off-list because it isn't fully documented for general distribution.  At some point in the future (possibly May 2nd), I intend to include it in the CWB/Perl package as a utility "cwb-featex".

Best,
Stefan