[CWB] Importing large CoNLL-U corpora
Hardie, Andrew
a.hardie at lancaster.ac.uk
Fri Dec 1 04:48:56 CET 2017
Zipser and Romary's "Pepper" format-converter framework has an importer for CoNLL and an exporter for TreeTagger (which is the same as CWB).
I have never used it myself so I don't know if it will do exactly what you need, but you might be in luck!
See here:
http://corpus-tools.org/pepper/knownModules.html
best
Andrew.
-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Fabricio Chalub
Sent: 30 November 2017 15:53
To: cwb at sslmit.unibo.it
Subject: [CWB] Importing large CoNLL-U corpora
Hi,
we want to import about 170K files of CoNLL-U files into CWB/CQPweb
(at least the POS and lemma parts as I understand that dependencies
are not supported).
I was wondering if anyone here has any scripts already written for
this task, even if they are temporary hacks. Any pointers?
cheers,
Fabricio
--
Fabricio Chalub
http://fcbr.github.io/
http://researcher.ibm.com/person/br-fchalub
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://liste.sslmit.unibo.it/mailman/listinfo/cwb
More information about the CWB
mailing list