[CWB] Invitation to join IMS Corpus Workbench development

Stefan Evert stefan.evert at uos.de
Fri Feb 24 16:31:38 CET 2006


Dear friends!

(If you're wondering why you receive this e-mail: it's because you have
expressed in some way or other - at some time or other - that you would
be interested in contributing to the further development of the IMS
Corpus Workbench; or because you're working on closely related tools and
showed interest to ensure interoperability.)

As most of you will already know (at least those who were in contact
with me during the past 9 months), the IMS has finally decided to
release the Corpus Workbench source code under the GPL license. The
source code will soon be available at http://cwb.sourceforge.net/,
obsoleting the current CWB homepage at the IMS (the only reason it's not
up there yet is that I haven't found the time to do the minimal amount
of cleaning up to make it fit for public release, plus I only learned
how to use CVS last week). We would like to make use of this opportunity
to give CWB development a significant boost, and ensure long-term
maintenance by establishing a community of volunteers. (After
considering a number of alternatives, no replacement with the full power
and flexibility of the CWB is in sight at least for the next few years,
so we intend to continue using and developing it despite its shortcomings.)

Through informal discussion and coincidence, a group of three people
with a strong personal interest in using and improving the CWB has
emerged, namely Stefan Evert (yeah right, myself), Marco Baroni and
Serge Sharoff.  We are planning to kick off the development with an
initial meeting in Forli, on the weekend preceding the LREC conference
(precisely, from Thursday, May 18 to Saturday, May 20).  The idea is to
cover the following topics at this meeting:

- I will give an introduction to the CWB architecture and its very messy
source code (to get C hackers started on, well, hacking the code)
- we will identify shortcomings and limitations of the CWB that should
be addressed in the near future
- ideally, we will also manage to draw up a general roadmap for the
development and parcel out feasible tasks for individual developers
- any time left after the general discussion could be used to "play
around" with the source code, while I would be around to help with
problems or point out areas where improvements could be made to limited
chunks of code (without a full understanding of the entire code base)

We would like to invite you to join this development effort, provided
that you have a substantial interest in using improved versions of the
CWB in future (or if you would like to contribute in some other way,
e.g. by providing interfaces to other software packages, bringing in
your experience with corpus query systems, etc.). If you are interested,
please join the CWB development mailing list at

http://wacky.sslmit.unibo.it/mailman/listinfo/cwb

I won't pester you with further e-mails regarding CWB development unless
you subscribe there. :-)

It would be great if you could attend our kick-off meeting, especially
if you consider working on the CWB source code (which is entirely
written in C) or do some other low-level coding (e.g. adding interfaces
to languages other than Perl). The date and location should make this
easy to arrange, at least for those of you who attend the LREC
conference anyway. Further planning for this meeting will be carried out
through the mailing list.

Hoping to see you soon on the CWB development mailing list (please let
me know if you sign up)!
All the best,
Stefan Evert.





More information about the CWB mailing list