[CWB] Setting entire document as context

Stefan Evert stefanML at COLLOCATIONS.DE
Mon Mar 14 13:01:33 CET 2011


>>> If your documents are delimited with "text" tags, you'd use
>>> set context 1 text ;
> 
> Unfortunately, they're not -- I'd just assumed that CWB's corpus-building routine would store the source file names as a matter of course.

Well, the CWB data model isn't document-based -- CQP is not an Information Retrieval or Web Search engine.  Traditionally, the input for cwb-encode was a single long stream of tokens/tags, possibly concatenated from a large number of source documents.

The ability to encode a corpus from multiple input files is a relatively recent convenience feature.  At one point, I thought about adding automatic <file> tags around the individual files (yes, I'd like to have that, too), but I couldn't spare enough time for the necessary changes.

Best,
Stefan



More information about the CWB mailing list