[CWB] CWB

Stefan Evert stefan.evert at uos.de
Fri Feb 2 16:11:25 CET 2007


Dear Cobus,

thanks for your interest in the CWB and your willingness to  
contribute to further development! Uli Heid has just forwarded your e- 
mail to me as well.

The answer to your first question is simple: mea culpa! I still  
haven't found the time to clean up the CWB source code for release  
(which includes writing a few minimal bits of documentation and  
inserting copyright/license notices) and upload it to sourceforge  
(I'm not very experienced with sf.net either, so it will probably  
take me some time to get the SVN upload right).

The release is now rather firmly schedule for Feb 20 or so, since I  
will have some time to focus on this task from around Feb 12.

Concerning your second question, in principle it should be possible  
to compile and use the CWB in a Windows environment, although some  
adjustments may be necessary.  Most CWB users run some flavour of  
Unix or have a dedicated Linux server for their CWB installation, but  
we have experimented a little with CWB on Windows using the Cygwin  
emulation layer.  Basically, it seems to compile and run, but  
performance is very poor.  Insofar as we can guess at the moment,  
this may have to do with CWB's reliance on memory-mapping to access  
data files efficiently and with the way memory-mapping is emulated by  
Cygwin.

It would be great to have someone work on a native Windows port.  As  
far as I understand, Win32 _does_ support memory-mapping in an  
efficient way, so I suspect it's just the Cygwin implementation that  
causes our speed problems, and the CWB should run much better when  
compiled in a standard Windows environment.  The port should not be  
too difficult, since CWB doesn't rely very heavily on Unix  
technology.  Basically, it needs a standard C compiler (GCC  
recommended), POSIX-compatible system libraries (including, most  
importantly, the mmap() function for memory-mapping files), and a few  
standard Unix command-line utilities (such as "less" or "gzip"),  
which are also available for Windows or could be replaced by built-in  
implementations.

I'll announce availability of the source code on this mailing list!

Best regards,
Stefan Evert

On 1 Feb 2007, at 09:40, Cobus Conradie wrote:

> I'm writing on behalf of the Center for Text Technology at the  
> North-West University, South Africa. We have quite a few projects  
> lined up for the following three years mainly on African languages  
> and need to get these Corpora in some kind of structure for  
> reference and manipulation. The Corpus WorkBench seems to be just  
> the thing. Unfortunately I’m from the windows world and know very  
> little of Sourceforge.net.
>
> We would like to contribute in the form of development and  
> participation in the project. First question is, I can’t seem to  
> find the source for the project. I went to http:// 
> cwb.svn.sourceforge.net/viewvc/cwb/ but can’t find anything there.  
> The second question is if there is a Windows version of this  
> product or at least the possibility of porting it to a Windows  
> environment? (without rewriting the whole thing)
> Hope to hear from you soon.
>
> Regards
>



More information about the CWB mailing list