[CWB] Setting entire document as context

Mon Mar 14 15:24:14 CET 2011

On 03/14/2011 01:01 PM, Stefan Evert wrote:

> Well, the CWB data model isn't document-based -- CQP is not an
> Information Retrieval or Web Search engine.  Traditionally, the input
> for cwb-encode was a single long stream of tokens/tags, possibly
> concatenated from a large number of source documents.

Quite right, of course!

> The ability to encode a corpus from multiple input files is a
> relatively recent convenience feature.  At one point, I thought about
> adding automatic<file>  tags around the individual files (yes, I'd
> like to have that, too), but I couldn't spare enough time for the
> necessary changes.

Good to know. Thansk as always, Stefan.

Cheers,
Scott