[CWB] What could be failing

"Andrés Chandía" andres at chandia.net
Wed Mar 7 17:24:17 CET 2018



Hi there, I have a corpus that indexes ok, no warnings or errors, but when installing it at
the cqpweb it gives me an error ath the "Manage corpus
XML" stage, one of the handles is "s_id", if I try to change the datatype from
"freetext" to "unique ID" it says: The datatype of s_id cannot be
changed to [unique ID], because there are duplicate values in the CWB index.

With it counterpart, the parallel corus in spanish this does not happens...

I have checked quite a few times if there is such duplication, but none
was found. At this corpus I have 312 sentences, all of them with its own not duplicated id.

If I do
#grep "<s id"
txtgmmden_en.txt |uniq -d
it shows no results [it should show the duplicated
one(s)]

if i do
#grep "<s id"
txtgmmden_en.txt |wc -l
#grep
"</s>" txtgmmden_en.txt |wc -l

I get in both cases:
312

I attach the file for you to see there is no duplication, what
can it be then?

Thanks



_______________________

            andrés
chandía
 
Dungupeyem | IECMap | ISECMap | NMT | Corlexim

administrador de:
Parles.upf | IWCH | Amind terapia | ONG
Mapuche koyaktu | Nocando | IAC | CddZ | ISAC | CatCg
P No imprima innecesariamente. ¡Cuide el
medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180307/e3857f2e/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: txtgmmden_en.txt
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180307/e3857f2e/attachment-0001.txt>


More information about the CWB mailing list