[CWB] Indexing problems

Serge Heiden slh at ens-lyon.fr
Wed Jul 21 20:10:45 CEST 2010


Eros,

I don't know if the error report is true or false but
it is difficult to analyze without an exerpt of your
parapedia_en.tgd file.
Are you sure that all your <text> tags have a corresponding
</text> ending tag ?

Best,
Serge

Selon Eros Zanchetta:
> Hi there,
> 
> I hope someone can help me with this because it's driving me crazy. I'm 
> trying to encode a corpus with cwb-encode, the syntax I use is:
> 
> cwb-encode -d PARAPEDIA_EN -f parapedia_en.tgd -R 
> /usr/local/share/cwb/registry/parapedia_en -P pos -P lemma -S corpus -S 
> text:0+id+target+keywords -S s >parapedia_en_indexing.out 
> 2>parapedia_en_indexing.err
> 
> there appears to be something wrong with the corpus, unfortunately I 
> can't figure out what it is (I attached the error stream from the 
> encoding process to this e-mail).
> 
> What baffles me is the error reports, I assume that when it says:
> 
> Attributes of open tag <text ...> ignored because of syntax error (file 
> [...], line #1021648).
> 
> it means that at line 1021648 of the input file there is a <text> tag 
> with some kind of syntax error, but there's no <text> tag at that line 
> (I obviously tried a few other lines, but not all of them since it's a 
> very large file). Am I reading the error report wrong?
> 
> I use version 2.2.100 of cwb.
> 
> Thanks in advance,
> Eros
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb

-- 
********
*FR* Merci d'utiliser ma nouvelle adresse mail slh at ens-lyon.fr ****
*EN* Please use my new email address slh at ens-lyon.fr           ****
********
Dr. Serge Heiden, slh at ens-lyon.fr, http://textometrie.ens-lsh.fr
ENS de Lyon/CNRS - ICAR UMR5191, Institut de Linguistique Française
15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883


More information about the CWB mailing list