[CWB] maximum corpus size and structural attributes

Stefan Evert stefanML at collocations.de
Thu Jul 7 12:31:30 CEST 2011


> Do structural attributes count towards the 2^31 token boundary?

No, they're stored as pairs of start and end positions rather than included in the token stream.

You should be able to build a corpus containing exactly 2^31 - 1 tokens.

Best,
Stefan



More information about the CWB mailing list