[CWB] Distribution of build scripts (especially LDC corpora)?

Yannick Versley yversley at gmail.com
Wed Sep 2 00:36:10 CEST 2015


I've got a Python library that (besides the usual mixer and kitchensink
stuff) can
convert many treebank formats (i.e., bracketed format, Negra Export,
TigerXML)
to CQP format.

My guess is that it's not too difficult to get to CWB format for reasonably
standard
formats (CoNLL-X/09/12, MMAX2), and plugging a CWB exporter to (say) CoreNLP
or SpaCy would allow to treat plain text without difficulty, but other
formats may pose
more difficulties.

Best wishes
Yannick

On Tue, Sep 1, 2015 at 6:29 PM, Mats Rooth <mr249 at cornell.edu> wrote:

> Is there any effort to distribute scripts or makefiles for building a CWB
> corpus from distributed corpora?   I’ve done a variety of these, the effort
> is significant, so it seems a shame to redo it.  There’s also benefit in
> establishing standard mappings. LDC materials are especially relevant for
> us.
>
>   — Mats
>
> Mats Rooth
> Professor
> Dept. of Linguistics and Faculty of Computing and Information
> Cornell University
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20150902/f06be5ed/attachment.html>


More information about the CWB mailing list