[CWB] Trying to deal with tag problems when encoding
Scott Sadowsky
ssadowsky at gmail.com
Fri May 19 21:28:45 CEST 2017
On Fri, May 19, 2017 at 7:15 AM, Stefan Evert <stefanML at collocations.de>
wrote:
Thanks again, Stefan.
No, my logic was quite simple: If there's a missing </text> tag in one of
> your files, this region isn't closed and will extend to the very end of the
> corpus (unless there is a superfluous </text> tag or a damaged <text> in
> another file). So there was a good change that the last <text> region in
> the corpus would be the critical one.
>
Right on.
What you want is
>
> set PrintStructures text_id;
>
Extremely useful command, this!
One of the advantages of -S text:0 is that it shows you there is a problem
> – with the "hotfix" solution, it's completely hidden.
>
Indeed. Fortunately, with the set PrintStructures command I was able to
ferret out what I hope is the last bad tag in the corpus, and I'm currently
re-encoding using -S text:0.
Thanks for all your help.
Cheers,
Scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20170519/5db008f4/attachment.html>
More information about the CWB
mailing list