[CWB] Indexing problems
Serge Heiden
slh at ens-lyon.fr
Wed Jul 21 20:10:45 CEST 2010
Eros,
I don't know if the error report is true or false but
it is difficult to analyze without an exerpt of your
parapedia_en.tgd file.
Are you sure that all your <text> tags have a corresponding
</text> ending tag ?
Best,
Serge
Selon Eros Zanchetta:
> Hi there,
>
> I hope someone can help me with this because it's driving me crazy. I'm
> trying to encode a corpus with cwb-encode, the syntax I use is:
>
> cwb-encode -d PARAPEDIA_EN -f parapedia_en.tgd -R
> /usr/local/share/cwb/registry/parapedia_en -P pos -P lemma -S corpus -S
> text:0+id+target+keywords -S s >parapedia_en_indexing.out
> 2>parapedia_en_indexing.err
>
> there appears to be something wrong with the corpus, unfortunately I
> can't figure out what it is (I attached the error stream from the
> encoding process to this e-mail).
>
> What baffles me is the error reports, I assume that when it says:
>
> Attributes of open tag <text ...> ignored because of syntax error (file
> [...], line #1021648).
>
> it means that at line 1021648 of the input file there is a <text> tag
> with some kind of syntax error, but there's no <text> tag at that line
> (I obviously tried a few other lines, but not all of them since it's a
> very large file). Am I reading the error report wrong?
>
> I use version 2.2.100 of cwb.
>
> Thanks in advance,
> Eros
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
--
********
*FR* Merci d'utiliser ma nouvelle adresse mail slh at ens-lyon.fr ****
*EN* Please use my new email address slh at ens-lyon.fr ****
********
Dr. Serge Heiden, slh at ens-lyon.fr, http://textometrie.ens-lsh.fr
ENS de Lyon/CNRS - ICAR UMR5191, Institut de Linguistique Française
15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883
More information about the CWB
mailing list