[CWB] Efficient way to count frequencies on large data

Sébastien Jacquot sebastien.jacquot at univ-fcomte.fr
Fri Dec 18 14:37:08 CET 2015


Hi,
I'm looking for an efficient way to get the frequencies of repeated 
token sequences on large corpora.
At this moment I use:
R = ([][][][]);
count R by word cut 20;

Is there a faster way to do that in terms of performances? (I mean for 
example by directly grouping and counting the results rather than 
getting all the results and then count them?)
Thanks in advance.
Sebastian




More information about the CWB mailing list