[CWB] Cygwin
Stefan Evert
stefan.evert at uos.de
Thu Dec 4 23:17:17 CET 2008
Hi Eros!
> I just installed the latest version of CWB in my cygwin environment.
That's great to hear! Can you tell us (put on the wiki etc.) how you
got CQP to work in Cygwin? Is there a new version of Cygwin or
particular configuration tricks?
I remember when you were staying at Osnabrück, we got CQP to compile
in Cygwin, but was dead slow and would quickly run out of memory.
That's still my current status when I tried within VirtualBox (Windows
XP + Cygwin). What system setup do you use?
> I was impressed by the performance (I was able to query a 100m corpus
> without problems) but unfortunately I noticed that apparently you
> cannot
> save queries using the "save" command (all I get in the
> DataDirectory is
> an empty file that has the same name as the corpus, i.e. "DICKENS")
>
> My guess is that Windows doesn't like the colon in the filename of the
> saved query (DICKENS:MyQuery).
Yes, since it's used after drive letters (C: and all that), that's
hardly surprising. I would have expected Cygwin to be a little more
intelligent about this, though ...
> Does anyone know if there is a way to change the default naming
> convention? (possibly something that doesn't involve hacking the
> source
> code...)
No.
The ":" separator is hard-coded into CQP ... in many different
places. Most of the relevant code is in cqp/corpmanag.c, and there's
a temptingly named macro "COLON" near the top of the file. However,
changing this #define will only break things, as the ":" character is
hard-coded (without macro abstraction) in various other places -- most
notably in the code that generates filenames for saved corpora.
If there's a chance to get CQP to work reasonably well on Cygwin, I
think it's worth reviewing the code to make the separator character
configurable (or perhaps set it during the compilation, so it defaults
to something else than ":" on Windows). I'll have to go through the
source code carefully to find out exactly where filenames are
generated and parsed, and we'd need thorough beta testing on Cygwin.
Best wishes,
Stefan
--
The wonders of Googleology (episode 1)
"from collectibles to cars"
84,700,000 -- Google
9,443,672 -- Google N-grams (Web 1T5)
1 -- ukWaC
[ stefan.evert at uos.de | http://purl.org/stefan.evert ]
More information about the CWB
mailing list