<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
Thanks again, Stephanie. The problem was indeed the U+FEFF or BOM
character lurking somewhere in the file. I thought I'd removed it
with the command I usually use: <br>
<pre>sed -i '1s/^\xEF\xBB\xBF//' myfile.txt</pre>
until I realised that this is only any good before concatenating the
files (as it only targets the first line). So I tried <br>
<pre>sed -i 's/^\xEF\xBB\xBF//g' myfile.txt</pre>
and things worked from that point on!<br>
Best,<br>
Graham.<br>
<br>
<div class="moz-cite-prefix">Le 26/04/2023 à 21:14, Stephanie Evert
a écrit :<br>
</div>
<blockquote type="cite"
cite="mid:39384DA6-9C00-4E85-B507-F5D00FA9EEBF@collocations.de">
<pre class="moz-quote-pre" wrap="">
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On 26 Apr 2023, at 16:28, Graham Ranger -- UAPV <a class="moz-txt-link-rfc2396E" href="mailto:graham.ranger@univ-avignon.fr"><graham.ranger@univ-avignon.fr></a> wrote:
Many thanks for your help. Unfortunately, that didn't work... I've just checked: my XML tags are on different lines (though I would hope that would not make a difference) and the only spaces in the file are in the XML tag between "text" and "id".
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
If you can access the CWB-indexed corpus (or index it yourself on the command-line with cwb-encode and cwb-make), then you could find the location of the problem with a CQP query
        [ ! text_id ];
Best,
Stephanie        
_______________________________________________
CWB mailing list
<a class="moz-txt-link-abbreviated" href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a>
<a class="moz-txt-link-freetext" href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb">http://liste.sslmit.unibo.it/mailman/listinfo/cwb</a>
</pre>
</blockquote>
<br>
</body>
</html>