[CWB] Indexing problems
Eros Zanchetta
eros at sslmit.unibo.it
Wed Jul 21 19:20:29 CEST 2010
Hi there,
I hope someone can help me with this because it's driving me crazy. I'm
trying to encode a corpus with cwb-encode, the syntax I use is:
cwb-encode -d PARAPEDIA_EN -f parapedia_en.tgd -R
/usr/local/share/cwb/registry/parapedia_en -P pos -P lemma -S corpus -S
text:0+id+target+keywords -S s >parapedia_en_indexing.out
2>parapedia_en_indexing.err
there appears to be something wrong with the corpus, unfortunately I
can't figure out what it is (I attached the error stream from the
encoding process to this e-mail).
What baffles me is the error reports, I assume that when it says:
Attributes of open tag <text ...> ignored because of syntax error (file
[...], line #1021648).
it means that at line 1021648 of the input file there is a <text> tag
with some kind of syntax error, but there's no <text> tag at that line
(I obviously tried a few other lines, but not all of them since it's a
very large file). Am I reading the error report wrong?
I use version 2.2.100 of cwb.
Thanks in advance,
Eros
-------------- next part --------------
A non-text attachment was scrubbed...
Name: parapedia_en_3_indexing.zip
Type: application/zip
Size: 2646 bytes
Desc: not available
Url : http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20100721/ed11eb69/parapedia_en_3_indexing.zip
More information about the CWB
mailing list