[CWB] Word list
Stefan Evert
stefanML at collocations.de
Tue Jul 25 16:11:50 CEST 2017
> On 25 Jul 2017, at 09:57, Andrés Chandía <andres at chandia.net> wrote:
>
> No sorry, I didn't mean at the cqp interface, but going by terminal to a linux shell..
With a CWB-encoded corpus, it's fastest to do
cwb-lexdecode -f -s CORPUS
or e.g. for a lemma attribute
cwb-lexdecode -P lemma -f -s CORPUS
If you need more complex frequency counts, e.g. word/pos combinations, you should take a look at the cwb-scan-corpus tool described in the CWB Corpus Encoding Tutorial.
In principle, it's possible to obtain a full frequency count in CQP with a dummy query that matches every single token:
> AllWords = [];
> group AllWords match word;
but this is extremely inefficient and uses huge amounts of memory, so don't do that. :-)
Best,
Stefan
More information about the CWB
mailing list