[CWB] Line returns, etc.

Graham Ranger -- UAPV graham.ranger at univ-avignon.fr
Mon Sep 25 15:35:23 CEST 2017


Hello to all,
When formatting a plain text corpus for viewing in cqpweb, I like to 
remove unwanted line breaks that result from the use of OCR software. 
This can be slightly awkward to implement and I was wondering whether it 
is strictly necessary. Will search results be affected depending on the 
presence / absence of line breaks or is the removal a waste of energy? I 
imagine it is probably a question of whether the processing is line- or 
stream-based... but I'd appreciate some expert opinion!
Thanks in advance.
Best,
Graham.


More information about the CWB mailing list