[CWB] Unicode support in CWB version 3.2.b3

Hardie, Andrew a.hardie at lancaster.ac.uk
Sun Aug 15 18:07:31 CEST 2010


Hi all,

Just a quick note to let everyone know that the Unicode support features
are (as of last weekend) now more-or-less complete with the addition of
UTF8-aware sorting in CQP, charset-checking in cwb-encode, and proper
instructions for adding the new external libraries to the build.

You are now very welcome to beta-test the new version if you are not
using it already: http://cwb.sourceforge.net/beta.php .

Note, of course, that "complete" does not imply "bug-free", and there
are three things in particular that I am anxious to check are working
properlu. 

The first is sorting in UTF8 (it is not clear, in particular, that
case-sensitive diacritic-sensitive sorting will behave as it should);
the second is regular expression optimisation (the current build is set
up to print a message if the optimisation should have worked but
didn't); and the third is the Windows build system with the PCRE and
Glib support libraries - it works on my computer, but whether it will
work on anyone else's remains to be confirmed.

All help and especially bug-spotting is gratefully accepted.

best

Andrew.


More information about the CWB mailing list