[CWB] web-interface with aligned corpora and WebCqp::Persistent

lars nygaard lars.nygaard at iln.uio.no
Wed Feb 21 23:22:39 CET 2007


Stefan Evert wrote:
> there is no proper Unicode support for the simple reason that this 
> would require us to compile against huge Unicode libraries with 
> potential licensing problems.  There's also a certain performance 
> penalty: regular expressions and case/diacritic-insensitive searching 
> are more efficient for byte encodings than for Unicode (UTF-8) data.
I've heard Good Things about IBMs ICU  
(http://www-306.ibm.com/software/globalization/icu/index.jsp). 
Apparently, regular expressions etc. are quite well optimized, but there 
might be a significant speed penalty at program startup, which might be 
a bit of a bummer for CGI applications (although it should be possible 
to run cqp as some kind of daemon).

-lars


More information about the CWB mailing list