[CWB] Cygwin

Stefan Evert stefan.evert at uos.de
Thu Dec 4 23:17:17 CET 2008


Hi Eros!

> I just installed the latest version of CWB in my cygwin environment.

That's great to hear! Can you tell us (put on the wiki etc.) how you  
got CQP to work in Cygwin?  Is there a new version of Cygwin or  
particular configuration tricks?

I remember when you were staying at Osnabrück, we got CQP to compile  
in Cygwin, but was dead slow and would quickly run out of memory.   
That's still my current status when I tried within VirtualBox (Windows  
XP + Cygwin).  What system setup do you use?

> I was impressed by the performance (I was able to query a 100m corpus
> without problems) but unfortunately I noticed that apparently you  
> cannot
> save queries using the "save" command (all I get in the  
> DataDirectory is
> an empty file that has the same name as the corpus, i.e. "DICKENS")
>
> My guess is that Windows doesn't like the colon in the filename of the
> saved query (DICKENS:MyQuery).

Yes, since it's used after drive letters (C: and all that), that's  
hardly surprising.  I would have expected Cygwin to be a little more  
intelligent about this, though ...

> Does anyone know if there is a way to change the default naming
> convention? (possibly something that doesn't involve hacking the  
> source
> code...)

No.

The ":" separator is hard-coded into CQP ... in many different  
places.  Most of the relevant code is in cqp/corpmanag.c, and there's  
a temptingly named macro "COLON" near the top of the file.  However,  
changing this #define will only break things, as the ":" character is  
hard-coded (without macro abstraction) in various other places -- most  
notably in the code that generates filenames for saved corpora.

If there's a chance to get CQP to work reasonably well on Cygwin, I  
think it's worth reviewing the code to make the separator character  
configurable (or perhaps set it during the compilation, so it defaults  
to something else than ":" on Windows).  I'll have to go through the  
source code carefully to find out exactly where filenames are  
generated and parsed, and we'd need thorough beta testing on Cygwin.


Best wishes,
Stefan



--
The wonders of Googleology (episode 1)

"from collectibles to cars"
	84,700,000 -- Google
	9,443,672 -- Google N-grams (Web 1T5)
	1 -- ukWaC

[ stefan.evert at uos.de | http://purl.org/stefan.evert ]










More information about the CWB mailing list