[CWB] web-interface with aligned corpora and WebCqp::Persistent

lars nygaard lars.nygaard at iln.uio.no
Mon Feb 26 13:37:35 CET 2007


Stefan Evert wrote:
>> I've heard Good Things about IBMs ICU  
>> (http://www-306.ibm.com/software/globalization/icu/index.jsp).
>
> There may also be some technical issues: ICU will bloat the CWB 
> binaries considerably (especially if we have to link it statically), 
> make it more difficult to compile and distribute the CWB (at the 
> moment, it has very few prerequisites beyond GCC, ncurses, bison and 
> flex, and compiles rather easily on almost every Unix platform – 
> except for Ubuntu), and might make it necessary to ship a huge runtime 
> database (I have no idea whether ICU requires Unicode and locale 
> database files, but it seems quite likely).  If it weren't for this 
> latter issue, I would probably have rewritten the CWB as a Perl module 
> by now. :o)
Installing ICU should be as simple as:

 # cd /tmp
 # wget -c --passive 
'ftp://ftp.software.ibm.com/software/globalization/icu/3.6/icu4c-3_6-src.tgz'
 # tar -zxvf icu4c-3_6-src.tgz
 # cd icu/source/
 # ./runConfigureICU Linux
 # make
 # make install
 # echo "/usr/local/lib" >> /etc/ld.so.conf
 # ldconfig

for most linux systems, at least.


>> Apparently, regular expressions etc. are quite well optimized, but 
>> there might be a significant speed penalty at program startup, which 
>> might be a bit of a bummer for CGI applications (although it should 
>> be possible to run cqp as some kind of daemon).
>
> Do you know if there is a substantial startup penalty because of some 
> initialisation the ICU libraries have to perform?  Just loading and 
> linking large libraries should be fairly fast if they're already 
> cached in RAM. 
Come to think of it, the application that I worked with was not slowed 
down by the actual loading of the program files, but by the compilation 
at startup of a large rule-file. So perhaps regexp stuff etc. are, in 
fact, considerably slower ...

cheers,
lars




More information about the CWB mailing list