[CWB] Encoded corpus shows hits for [word=".*"] but not for any real word---not even for [word="a.*"]
Jörg Knappen
j.knappen at mx.uni-saarland.de
Wed May 7 08:47:17 CEST 2014
I have encoded a corpus with cqp-3.0 and found that the corpus query
[word="*."];
gives lot of results, but any other query I tried gave 0 results.
I suspect that there is something in the raw data causing this
behaviour, but I don't
know what to look for. The data is not very clean, it comes from OCR
and not all OCR
errors are corrected. Encoding throws some warnings like
Malformed tag <, inserted literally (file lat2-vrt//0006752_lat2.vrt,
line #6).
However, the same kind of warning occurred with a preivious
installment of the same corpus
and there the cqp query worked fine.
Any hints?
--Jörg Knappen
More information about the CWB
mailing list