[CWB] Problem with -R function (Cygwin)

Hardie, Andrew a.hardie at lancaster.ac.uk
Wed Jul 5 01:28:25 CEST 2017


Hi Aleksandar,

Hmmm, never seen this before!

Using the Windows binaries under Cygwin causes a nasty clash of Windows and Unix assumptions. This is because the binaries don’t “know” they are running within Cygwin.

Notably, the windows binaries expect Windows-style paths (with \ ) but you are giving it Unix-style paths. These get interpreted OK by Cygwin but not by CWB itself.

That is why it is saying that certain directories don’t exist: because it does not recognise / as a path-separator.

That’s also why it says “corpus/reg/opn is not a valid corpus ID” – it is interpreting it as a one-element path, not a three element path, and trying to use that one element as a corpus ID. But “/” is an illegal character in a corpus ID.

It creates the file correctly because Cygwin itself is interpreting the path for disk access here. But cwb-encode can’t parse the path itself internally to extract the corpus ID. So it is not successful in assembling the registry data to write anything to file.

There is a “Cygwin” platform file in the “config” tree of the source code, from which I deduce that it is, or was, or was once intended to be, possible to compile a version of CWB specifically to run on Cygwin. I don’t know whether this still works. But its presence was why I did not consider that the Windows binaries might need to be usable within Cygwin.

Hope this helps.

best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Aleksandar Trklja
Sent: 04 July 2017 12:22
To: cwb at sslmit.unibo.it
Subject: [CWB] Problem with -R function (Cygwin)

Dear all,

I've recently installed the Windows version of 'cwb' but I don't manage to encode any corpus because of a problem with the -R command. I use Cygwin.

When I specify the name of a new directory I receive an error message that the corpus ID is not valid and that there is an error on the last line in my file. This happens regardless of what my source text is.  Interestingly, -R does create a registry file but it is empty.

$ cwb-encode.exe -d corpus/data/opn -f text.txt -R corpus/reg/opn -P pos -P lemma -S s:0 -c utf8
corpus/reg/opn is not a valid corpus ID! Can't create registry entry.
[location of error: file text.txt, line #27]


If I define the standard registry directory with 'export CORPUS_REGISTRY=' I get the message that my registry doesn't exist although the registry does exist.


$ cwb-encode.exe -d corpus/data/opn -f text.txt -R /cygdrive/c/cwb/registry/opn -P pos -P lemma -S s:0 -c utf8
Error: registry directory '/cygdrive/c/cwb/registry' does not exist.
Please create this directory first.

I should say that the encoding of the same texts on another machine with Cygwin-based 'cwb' works just fine.

This is an example of a text I've tried to encode:
<s>
The DT
the
nineteenth JJ
nineteenth
century NN
century
was VBD
be
, ,
,
until IN
until
recently RB
recently
, ,
,
predominantly RB
predominantly
seen VVN
see
as IN
as
a DT
a
century NN
century
of IN
of
rapid JJ
rapid
industrialisation  NN industrialisation
which WDT
which
set VVD
set
the DT
the
stage NN
stage
for IN
for
profound JJ
profound
social JJ
social
change NN
change
. SENT
.
</s>



Many thanks.

Best wishes
Aleksandar

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20170704/03f81ad7/attachment-0001.html>


More information about the CWB mailing list