[CWB] WACKy corpora and cwb
Andres Chandia
andres at chandia.net
Mon Jan 27 19:23:13 CET 2014
Is there any easy way to transform the metadata format for the Wacky corpora so that they can
be used with the cqpWeb interface? We are trying to install a few of these corpora but I have
problems with some of the headings.
When I try to index (encode) I get the
following errors:
Malformed tag <source="10178"/>, inserted
literally (file /B_NFS_P/resources/corpora/written/data/de/sdewac/sdewac-v3.tagged, line
#633867).
Malformed tag <error="0.0185185185185185"/>, inserted literally
(file /B_NFS_P/resources/corpora/written/data/de/sdewac/sdewac-v3.tagged, line #633868).
Malformed tag <source="10183"/>, inserted literally (file
/B_NFS_P/resources/corpora/written/data/de/sdewac/sdewac-v3.tagged, line #633929).
This obviously has to do with the labels year, source and error, which don't have the
necessary closing.
<sentence>
<year>="0"/>
<source="1403"/>
<error="0.00869565217391304"/>
<s>
Sie   PPER   Sie|sie
dürfen   VMFIN   dürfen
I can do a few
transformations using PERL but I'm wondering whether there is something that could make this
easier and faster.
___________________
            andrés
chandÃa
administrador de
parles.upf.edu
psicoaching.net
mapuche koyaktu
ong mapuche koyaktu
P No imprima innecesariamente. ¡Cuide el medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20140127/e15b2891/attachment.html>
More information about the CWB
mailing list