[CWB] Distribution of build scripts (especially LDC corpora)?

Stefan Evert stefanML at collocations.de
Fri Sep 11 19:57:42 CEST 2015


Dear Mats,

it would be great to have a repository of CWB conversion scripts for as many standard (and non-standard) corpus formats as possible.

So far, the only converter in the the main CWB repository is the one for BNC (XML edition) that Sebastian and I developed for use with BNCweb.  Contributions from CWB users are highly welcome.

I'd be happy to add donations to the CWB subversion repository, but they'd have to be released under a GPL licence. If they are under active development, it might be better to set up a separate git repository so it's easier to clone and send pull requests (provided that any of the main CWB developers ever gets the hang of git).

At the very least, we'd like to provide links on the CWB homepage, so if anybody has a collection of tools available online (and wants them to be listed), please let us know.  I'll probably set up a new subpage for import/export tools.

Best,
Stefan


> On 1 Sep 2015, at 18:29, Mats Rooth <mr249 at cornell.edu> wrote:
> 
> Is there any effort to distribute scripts or makefiles for building a CWB corpus from distributed corpora?   I’ve done a variety of these, the effort is significant, so it seems a shame to redo it.  There’s also benefit in establishing standard mappings. LDC materials are especially relevant for us.
> 



More information about the CWB mailing list