[CWB] Indexing problems

Eros Zanchetta eros at sslmit.unibo.it
Wed Jul 21 19:20:29 CEST 2010


Hi there,

I hope someone can help me with this because it's driving me crazy. I'm 
trying to encode a corpus with cwb-encode, the syntax I use is:

cwb-encode -d PARAPEDIA_EN -f parapedia_en.tgd -R 
/usr/local/share/cwb/registry/parapedia_en -P pos -P lemma -S corpus -S 
text:0+id+target+keywords -S s >parapedia_en_indexing.out 
2>parapedia_en_indexing.err

there appears to be something wrong with the corpus, unfortunately I 
can't figure out what it is (I attached the error stream from the 
encoding process to this e-mail).

What baffles me is the error reports, I assume that when it says:

Attributes of open tag <text ...> ignored because of syntax error (file 
[...], line #1021648).

it means that at line 1021648 of the input file there is a <text> tag 
with some kind of syntax error, but there's no <text> tag at that line 
(I obviously tried a few other lines, but not all of them since it's a 
very large file). Am I reading the error report wrong?

I use version 2.2.100 of cwb.

Thanks in advance,
Eros
-------------- next part --------------
A non-text attachment was scrubbed...
Name: parapedia_en_3_indexing.zip
Type: application/zip
Size: 2646 bytes
Desc: not available
Url : http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20100721/ed11eb69/parapedia_en_3_indexing.zip


More information about the CWB mailing list