[CWB] Word list

Stefan Evert stefanML at collocations.de
Tue Jul 25 16:11:50 CEST 2017


> On 25 Jul 2017, at 09:57, Andrés Chandía <andres at chandia.net> wrote:
> 
> No sorry, I didn't mean at the cqp interface, but going by terminal to a linux shell..

With a CWB-encoded corpus, it's fastest to do 

	cwb-lexdecode -f -s CORPUS

or e.g. for a lemma attribute

	cwb-lexdecode -P lemma -f -s CORPUS

If you need more complex frequency counts, e.g. word/pos combinations, you should take a look at the cwb-scan-corpus tool described in the CWB Corpus Encoding Tutorial.


In principle, it's possible to obtain a full frequency count in CQP with a dummy query that matches every single token:

	> AllWords = [];
	> group AllWords match word;

but this is extremely inefficient and uses huge amounts of memory, so don't do that. :-)

Best,
Stefan


More information about the CWB mailing list