[CWB] Setting entire document as context

Stefan Evert stefanML at collocations.de
Sun Mar 13 11:08:39 CET 2011


> If your documents are delimited with "text" tags, you'd use
> set context 1 text ;

Two notes on this:

1) If there are large(ish) documents in your corpus, the next "cat" command with this context setting will probably crash CQP because of a buffer overflow (known problem, we plan to address this in v3.2, but will require fundamental changes in the way kwic lines are formatted).  If you're happy with just the text, you can get around this in the following way.

Assuming your query result is named "Matches":

	Contexts = Matches expand to text;
	tabulate Contexts match .. matchend word;

This will also be a lot faster than "cat".

2) In many cases (prototypically the British National Corpus), it's better to keep the original documents in a separate database (for the BNC, individual disk files in XML format).  You would then just obtain the document IDs from CQP, and retrieve the actual documents in their original format.

Best wishes,
Stefan


More information about the CWB mailing list