[CWB] Corpus statistics

Adhuc Militarent misc.priv at gmail.com
Sun Mar 23 10:55:36 CET 2014


Dear all,

I was wondering if I had missed something when reading CWB documentation or
there does not exist any trivial way to generate per text corpus statistics
(eg. text_id, text_author, word_count, types_count etc.). I have already
tried both external  (cwb-scan-corpus) and internal (query = []; then
tabulate) approach, but without major success. I have also started to
analyse CQPWeb php scripts in order to see how it populates mysql tables
with frequency data, but it is not precisely what I was looking for (I am
still digging, though).
I would like only to add that apart from using proper CWB/CQPWeb, I also
used to manipulate my corpora from within R (with rcqp) and it would be a
great aid if this sort of information could be easily retrieved.

Thanks for any hint
Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20140323/347191b5/attachment.html>


More information about the CWB mailing list