[CWB] How to remove the corpus data files from cache?

Petrakis Stefanos Stefanos.Petrakis at eurac.edu
Mon Jan 21 17:51:29 CET 2008


Hallo everyone,  


> Message: 2
> Date: Sat, 12 Jan 2008 02:04:30 +0100
> From: Stefan Evert <stefan.evert at uos.de>
> Subject: Re: [CWB] Timing issues when using the CQP Perl module
> To: Open source development of the Corpus WorkBench
> 	<cwb at sslmit.unibo.it>
> Message-ID: <6CC2B42D-DB45-42BD-B251-23791D2F793E at uos.de>
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
> 
> >
> 
> Hi everyone!
> 
> >
> > While using the CQP Perl module inside a Perl server 
> script, I notice 
> > some timing anomalies.
> > These are introduced when I call the query method of a CQP object.
> >
> >     $my_cqp->query("$cqpQuery");
> > This line in my code for simple queries (e.g. [word="haus"%c] 
> > [word="mieten"%c] ) sometimes takes more than 30 seconds to return, 
> > with the usual response time being around 2-3 secs.
> > (i do some very basic time counting using gettimeofday to 
> get secs and 
> > microsecs).
> >
> > a) Has anybody come up with a similar observation? Or even a useful 
> > conclusion/solution?
> 
> I have never observed strange behaviour from the Perl module 
> in this respect (or has anyone else had such problems), so 
> the most likely answer is that sometimes CQP just takes a 
> long time to execute the query.
> 
> Query execution times depend greatly on server load, memory 
> usage, and whether the corpus data files are already cached 
> in memory or have to be read from disk.  Even a simple query 
> like the one you mentioned can take fairly long on a BNC-size 
> corpus when the cache is still "cold"; a second query 
> immediately afterward will complete in a few seconds or less.



Any idea how can I un-cache/remove the corpus data files from memory? 
I want to run some tests on a "cold" cache to check time performance
to compare the timing differences on my server between the cqp client and a simple perl script running the same queries.



> 
> How large are the corpora on which you've observed this behaviour?   
> There is absolutely no reason why CQP should take that long 
> on a 5- million word corpus.
> 



The size is about 100M .



> > b) Could it be a problem of my server's setup? Is a CWB-rebuild 
> > recommended?
> >
> > I am using Version:   2.2.b91 ( as reported by cqp -v).
> 
> Your version is a bit old, but the problem is most likely not 
> something that a CWB update would solve.
> 
> One thing that comes to mind is that your Web server may 
> impose a limit on the number of child processes in the CGI 
> subsystem (to keep it from being overloaded by many parallel 
> queries).  Since the Perl module has to spawn CQP as a 
> subprocess, it might be kept on hold occasionally when the 
> limit has been reached.  I'm not enough of an expert on Web 
> servers to tell whether this is a probable explanation or 
> not.  In any case, the delay should happen when you create a 
> new CQP object rather than when executing a query.
> 
> Best,
> Stefan
> 
> 


Cheers,

Stefanos


More information about the CWB mailing list