[CWB] CWB in Windows

Hardie, Andrew a.hardie at lancaster.ac.uk
Tue Mar 20 16:34:32 CET 2018


We have had persistent and ongoing problems with UTF-8 input and output  in the Windows console. We tried out some alternative consoles a few years ago – the notes on the results of these efforts can be found on the FAQ page here: http://cwb.sourceforge.net/faq.php?hoist=windows_terminal#windows_terminal

Note that it is the console, rather than the cmd.exe shell, that  causes the problem, so shifting to a different shell (eg PowerShell) doesn’t help.

(For “accented character” read “any 2+ byte character”)

But the age of these notes means that a better alternative console may well have come along in the meantime. If anyone knows of any do let us know.

I haven’t tested CWB on anything later than Win 7 myself, and would be interested to he4ar if anyone has. (Testing on 10 is something that will happen when I work out native building rather than cross-compiling; the Ubuntu box that I used to do the cross-compiling seems to have become terminally broken so this seemed a rather good juncture to work out the process for native compilation on Win.)

Yep the default registry is indeed C:\CWB\registry , this is specified in cl/globals.h in the code. (line 83 ff.)

best

Andrew.


From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Ciarán Ó Duibhín
Sent: 20 March 2018 11:14
To: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it>
Subject: Re: [CWB] CWB in Windows

Thanks, Luigi, you may well be right, but I'm only interested in what works natively under Windows.

The default registry directory seems to be  C:\CWB\registry

For utf-8 output under cmd.exe in Windows Vista, I tried several alternative command prompt tools.  With chcp 850, both cmd.exe and the alternative tools will output "í" as "├¡" and will recognize "├¡" in input as "í".  With chcp 65001, a couple of the alternatives show the utf-8 output correctly, but I have been unable to input anything and have it recognised as a utf-8 character.  Also the alternative tools all throw out warnings about "Not enough memory" and "Paging disabled."

I thought it would be informative to see how the utf-8 would fare in Windows 10, so I tried to install CWB (3.4.10-windows-i586-UPDATED.,zip) there.  I got as far as running cqp and choosing my corpus, but giving a word to search for produced "cqp has stopped working".  So I don't know if the utf-8 would show correctly in Windows 10, cqp didn't even get that far.  Has anyone else tried CWB under Windows 10?

Ciarán.
----- Original Message -----
From: Luigi Talamo<mailto:luigi.talamo at unibg.it>
To: Open source development of the Corpus WorkBench<mailto:cwb at sslmit.unibo.it>
Sent: Sunday, March 18, 2018 2:47 PM
Subject: Re: [CWB] CWB in Windows

Hello,
in my opinion, it is best to run cwb in a virtual  Linux environment under windows. I recall a VirtualBox image developed by the cwb team which works out of the box; by sharing a folder between windows and VirtualBox, you can safely play around with files and directories.
By the way, in the following weeks I hope to start a project aimed at providing docker containers for Cwb; docker containers are a new (well, not so new) technique of virtualization, which works pretty well under windows and macOS (and Linux, of course).
Best,
Luigi
—
Luigi Talamo, PhD

On 18 Mar 2018, at 00:36, Ciarán Ó Duibhín <coduibhin at btinternet.com<mailto:coduibhin at btinternet.com>> wrote:
Is there documentation on running CWB under Windows?

I have several questions, like
• how to get utf-8 output from cqp to show correctly in cmd.exe under Windows Vista?
• what is the default registry directory for cwb-encode?

Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180320/e9ce751a/attachment.html>


More information about the CWB mailing list