> Do structural attributes count towards the 2^31 token boundary? No, they're stored as pairs of start and end positions rather than included in the token stream. You should be able to build a corpus containing exactly 2^31 - 1 tokens. Best, Stefan