[CWB] web-interface with aligned corpora and WebCqp::Persistent
lars nygaard
lars.nygaard at iln.uio.no
Mon Feb 26 13:37:35 CET 2007
Stefan Evert wrote:
>> I've heard Good Things about IBMs ICU
>> (http://www-306.ibm.com/software/globalization/icu/index.jsp).
>
> There may also be some technical issues: ICU will bloat the CWB
> binaries considerably (especially if we have to link it statically),
> make it more difficult to compile and distribute the CWB (at the
> moment, it has very few prerequisites beyond GCC, ncurses, bison and
> flex, and compiles rather easily on almost every Unix platform –
> except for Ubuntu), and might make it necessary to ship a huge runtime
> database (I have no idea whether ICU requires Unicode and locale
> database files, but it seems quite likely). If it weren't for this
> latter issue, I would probably have rewritten the CWB as a Perl module
> by now. :o)
Installing ICU should be as simple as:
# cd /tmp
# wget -c --passive
'ftp://ftp.software.ibm.com/software/globalization/icu/3.6/icu4c-3_6-src.tgz'
# tar -zxvf icu4c-3_6-src.tgz
# cd icu/source/
# ./runConfigureICU Linux
# make
# make install
# echo "/usr/local/lib" >> /etc/ld.so.conf
# ldconfig
for most linux systems, at least.
>> Apparently, regular expressions etc. are quite well optimized, but
>> there might be a significant speed penalty at program startup, which
>> might be a bit of a bummer for CGI applications (although it should
>> be possible to run cqp as some kind of daemon).
>
> Do you know if there is a substantial startup penalty because of some
> initialisation the ICU libraries have to perform? Just loading and
> linking large libraries should be fairly fast if they're already
> cached in RAM.
Come to think of it, the application that I worked with was not slowed
down by the actual loading of the program files, but by the compilation
at startup of a large rule-file. So perhaps regexp stuff etc. are, in
fact, considerably slower ...
cheers,
lars
More information about the CWB
mailing list