[CWB] Corpus statistics

Sun Mar 23 12:23:37 CET 2014

Dear Chris, most of these questions can be answered using a three-line script with CWB::CL or cwb-python. Rcqp might also offer this if it offers low-level access to CQP’s corpus data.

Best wishes, Yannick

Von: Adhuc Militarent
Gesendet: ‎Sonntag‎, ‎23‎. ‎März‎ ‎2014 ‎10‎:‎55
An: cwb at sslmit.unibo.it

Dear all,

I was wondering if I had missed something when reading CWB documentation or there does not exist any trivial way to generate per text corpus statistics (eg. text_id, text_author, word_count, types_count etc.). I have already tried both external  (cwb-scan-corpus) and internal (query = []; then tabulate) approach, but without major success. I have also started to analyse CQPWeb php scripts in order to see how it populates mysql tables with frequency data, but it is not precisely what I was looking for (I am still digging, though).
I would like only to add that apart from using proper CWB/CQPWeb, I also used to manipulate my corpora from within R (with rcqp) and it would be a great aid if this sort of information could be easily retrieved.

Thanks for any hint
Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20140323/10fbc78c/attachment.html>