[CWB] Counting tokens, types and segments

Stefan Evert stefanML at collocations.de
Sun Dec 12 23:00:43 CET 2010


Also, "cwb-describe-corpus -s" for type/token counts for all attributes, or "cwb-lexdecode -S" if you want to know the token size of a corpus.

You can easily get the information from CWB::CL, of course.

Best,
Stefan


On 12 Dec 2010, at 22:42, Alberto Simões wrote:

> On 12/12/2010 21:39, Alberto Simões wrote:
>> Hello.
>> 
>> I am trying to count, using CWB, the number of tokens, types and
>> segments (annotations of "tu" type).
>> 
>> For the first, I am using the size of A = [];
>> 
>> For the second, I am being able to: group A matchend word
>> but it doesn't show me the total number of types.
>> 
>> For the last, no idea how to do it... yet.
> 
> This one is easy as well:
>   A = <tu> [];
>   size A;
> 
> Now, missing one :D
> 
> Thanks
> Alberto



More information about the CWB mailing list