[CWB] Importing large CoNLL-U corpora

Hardie, Andrew a.hardie at lancaster.ac.uk
Fri Dec 1 04:48:56 CET 2017


Zipser and Romary's "Pepper" format-converter framework has an importer for CoNLL and an exporter for TreeTagger (which is the same as CWB).

I have never used it myself so I don't know if it will do exactly what you need, but you might be in luck!

See here:

http://corpus-tools.org/pepper/knownModules.html


best

Andrew.
 

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Fabricio Chalub
Sent: 30 November 2017 15:53
To: cwb at sslmit.unibo.it
Subject: [CWB] Importing large CoNLL-U corpora

Hi,

we want to import about 170K files of CoNLL-U files into CWB/CQPweb
(at least the POS and lemma parts as I understand that dependencies
are not supported).

I was wondering if anyone here has any scripts already written for
this task, even if they are temporary hacks.  Any pointers?

cheers,
Fabricio
-- 
Fabricio Chalub
http://fcbr.github.io/
http://researcher.ibm.com/person/br-fchalub
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://liste.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list