[CWB] CWB-CL test failure (encoding?)

Piotr Bański bansp at o2.pl
Thu Aug 3 13:42:50 CEST 2017


Hi Stefan,

Thanks for the quick reply! I'll just go ahead and install, then.
Answering the remaining points intra-line below.

On 08/03/17 13:32, Stefan Evert wrote:
> Hi Piotr!
> 
>> I have CQPweb (3.2.27), cwb (3.4.12) and the CWB Perl module installed from trunk (rev. 982), and wanted to compile the remaining Perl modules, but I'm getting something that looks like an encoding problem when testing CWB-CL, and I have no idea of how to go about fixing that. I'll be grateful for hints.
> 
> The weird characters are probably just placeholders inserted by Perl when printing Unicode strings on an output stream that's not known to be Unicode.  On my Mac, I get plain question marks.
> 
>> Nearly-PostScriptum: I've just had the last look around to make sure that I haven't missed any troubleshooting hints and noticed the warning that "This version of CWB/Perl (...) is not compatible with the current beta track CWB 3.4.x" -- is this what I'm up against, please?
> 
> That's incorrect.  In fact, the current SVN trunk version of CWB/Perl is the one that _is_ compatible with CWB 3.4.x (and may no longer be compatible with CWB 3.0).  Where did you find the warning?

On the web page, http://cwb.sourceforge.net/download.php#perl

[...]

> Unfortunately, PCRE isn't 100% compatible with Perl regexp, and these discrepancies lead to the test failures with CWB-CL. (The difference is that /daß/ case-insensitively matches "daß" and "dass" in Perl, but only "daß" in PCRE.  Of course, /DASS/ fails to case-insensitively match "daß", so I would argue that the Perl behaviour is less consistent than PCRE. I would also argue that (a) Unicode and (b) natural language is a complete mess and both should be abandoned. :-)

Unicode might be worth keeping, especially when option (b) succeeds ;-)

Thanks and best,

   Piotr


More information about the CWB mailing list