[CWB] How to remove the corpus data files from cache?
Petrakis Stefanos
Stefanos.Petrakis at eurac.edu
Mon Jan 21 17:51:29 CET 2008
Hallo everyone,
> Message: 2
> Date: Sat, 12 Jan 2008 02:04:30 +0100
> From: Stefan Evert <stefan.evert at uos.de>
> Subject: Re: [CWB] Timing issues when using the CQP Perl module
> To: Open source development of the Corpus WorkBench
> <cwb at sslmit.unibo.it>
> Message-ID: <6CC2B42D-DB45-42BD-B251-23791D2F793E at uos.de>
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
>
> >
>
> Hi everyone!
>
> >
> > While using the CQP Perl module inside a Perl server
> script, I notice
> > some timing anomalies.
> > These are introduced when I call the query method of a CQP object.
> >
> > $my_cqp->query("$cqpQuery");
> > This line in my code for simple queries (e.g. [word="haus"%c]
> > [word="mieten"%c] ) sometimes takes more than 30 seconds to return,
> > with the usual response time being around 2-3 secs.
> > (i do some very basic time counting using gettimeofday to
> get secs and
> > microsecs).
> >
> > a) Has anybody come up with a similar observation? Or even a useful
> > conclusion/solution?
>
> I have never observed strange behaviour from the Perl module
> in this respect (or has anyone else had such problems), so
> the most likely answer is that sometimes CQP just takes a
> long time to execute the query.
>
> Query execution times depend greatly on server load, memory
> usage, and whether the corpus data files are already cached
> in memory or have to be read from disk. Even a simple query
> like the one you mentioned can take fairly long on a BNC-size
> corpus when the cache is still "cold"; a second query
> immediately afterward will complete in a few seconds or less.
Any idea how can I un-cache/remove the corpus data files from memory?
I want to run some tests on a "cold" cache to check time performance
to compare the timing differences on my server between the cqp client and a simple perl script running the same queries.
>
> How large are the corpora on which you've observed this behaviour?
> There is absolutely no reason why CQP should take that long
> on a 5- million word corpus.
>
The size is about 100M .
> > b) Could it be a problem of my server's setup? Is a CWB-rebuild
> > recommended?
> >
> > I am using Version: 2.2.b91 ( as reported by cqp -v).
>
> Your version is a bit old, but the problem is most likely not
> something that a CWB update would solve.
>
> One thing that comes to mind is that your Web server may
> impose a limit on the number of child processes in the CGI
> subsystem (to keep it from being overloaded by many parallel
> queries). Since the Perl module has to spawn CQP as a
> subprocess, it might be kept on hold occasionally when the
> limit has been reached. I'm not enough of an expert on Web
> servers to tell whether this is a probable explanation or
> not. In any case, the delay should happen when you create a
> new CQP object rather than when executing a query.
>
> Best,
> Stefan
>
>
Cheers,
Stefanos
More information about the CWB
mailing list