[CWB] Counting tokens, types and segments

Alberto Simões ambs at di.uminho.pt
Sun Dec 12 23:05:22 CET 2010



On 12/12/2010 22:00, Stefan Evert wrote:
> Also, "cwb-describe-corpus -s" for type/token counts for all attributes, or "cwb-lexdecode -S" if you want to know the token size of a corpus.

Great the cwb-describe-corpus :)
Now looking it up in CWB::CL
Thanks :D

Alberto

>
> You can easily get the information from CWB::CL, of course.
>
> Best,
> Stefan
>
>
> On 12 Dec 2010, at 22:42, Alberto Simões wrote:
>
>> On 12/12/2010 21:39, Alberto Simões wrote:
>>> Hello.
>>>
>>> I am trying to count, using CWB, the number of tokens, types and
>>> segments (annotations of "tu" type).
>>>
>>> For the first, I am using the size of A = [];
>>>
>>> For the second, I am being able to: group A matchend word
>>> but it doesn't show me the total number of types.
>>>
>>> For the last, no idea how to do it... yet.
>>
>> This one is easy as well:
>>    A =<tu>  [];
>>    size A;
>>
>> Now, missing one :D
>>
>> Thanks
>> Alberto
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb

-- 
Alberto Simões


More information about the CWB mailing list