[CWB] number and <text_id> tag inside a word search

Stefan Evert stefanML at collocations.de
Sun Feb 21 22:32:57 CET 2016


[CC: to the mailing list in case other people run into the same problem]

> On 21 Feb 2016, at 21:38, Daniel Renau <alphak87 at gmail.com> wrote:
> 
> Done!
> 
> 
> 
> I erased the 4 first hex pairs... and it works well now :)


EF BB BF ist the byte-order mark in UTF-8 … the root of all evil!  BOMs are chronically inserted by Windows editor programs (but by hardly any other software), and they're quite hard to get rid off.

While  CWB should really understand and skip the BOM at the start of a UTF-8 file, once you cat together several such file (e.g. feeding them to CWB from stdin), you produce illegal input with BOMs littered throughout the text.

Best,
Stefan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160221/7eedea32/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 15101 bytes
Desc: not available
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160221/7eedea32/attachment-0001.png>


More information about the CWB mailing list