[CWB] web-interface with aligned corpora and WebCqp::Persistent
lars nygaard
lars.nygaard at iln.uio.no
Wed Feb 21 23:22:39 CET 2007
Stefan Evert wrote:
> there is no proper Unicode support for the simple reason that this
> would require us to compile against huge Unicode libraries with
> potential licensing problems. There's also a certain performance
> penalty: regular expressions and case/diacritic-insensitive searching
> are more efficient for byte encodings than for Unicode (UTF-8) data.
I've heard Good Things about IBMs ICU
(http://www-306.ibm.com/software/globalization/icu/index.jsp).
Apparently, regular expressions etc. are quite well optimized, but there
might be a significant speed penalty at program startup, which might be
a bit of a bummer for CGI applications (although it should be possible
to run cqp as some kind of daemon).
-lars
More information about the CWB
mailing list