[CWB] CQPWeb Context Line Breaks

Stephen Barrett Stephen.Barrett at glasgow.ac.uk
Fri Aug 5 12:42:03 CEST 2016


Dear All, 

Apologies for the rather basic question:

We have imported a corpus into CQPWeb using the standard "Install a new corpus" interface. The corpus has not been indexed in CWB beforehand and is very simple indeed, comprising a text file with a number of <text> elements each just containing a list of tokens (one token per line) and no other information.

When the context for a search item is displayed in CQPWeb, new lines within the text file are ignored (so, for example, lines of verse are displayed on a single line). Conversely, prose paragraphs are split with line breaks after each end of sentence punctution mark (full stop, question mark, etc.)

Looking at the DICKENS example corpus we can see that the text is formatted as one would expect, with paragraphs displaying properly and so on and are therefore sure that we are doing something wrong.

The question is, then, is there something we should be doing when we generate our input file to ensure that line breaks within the corpus are preserved and that no extra line breaks (after each sentence) are added? We have searched through the documentation as I'm sure we're missing something obvious and basic but have come up short.

Many thanks in advance.

Stevie




More information about the CWB mailing list