[CWB] Setting entire document as context
Stefan Evert
stefanML at COLLOCATIONS.DE
Mon Mar 14 13:01:33 CET 2011
>>> If your documents are delimited with "text" tags, you'd use
>>> set context 1 text ;
>
> Unfortunately, they're not -- I'd just assumed that CWB's corpus-building routine would store the source file names as a matter of course.
Well, the CWB data model isn't document-based -- CQP is not an Information Retrieval or Web Search engine. Traditionally, the input for cwb-encode was a single long stream of tokens/tags, possibly concatenated from a large number of source documents.
The ability to encode a corpus from multiple input files is a relatively recent convenience feature. At one point, I thought about adding automatic <file> tags around the individual files (yes, I'd like to have that, too), but I couldn't spare enough time for the necessary changes.
Best,
Stefan
More information about the CWB
mailing list