[CWB] Re: Re: How to remove the corpus data files from cache?

Tue Jan 29 02:21:38 CET 2008

> Ok, so if the delay you report for BNCweb is more than a minute...  
> Then maybe there is nothing strange with a 20+ seconds delay for  
> cache-warming in our case...
> But, this introduces another problem for us, since we cannot expect  
> our users to wait for such a long time.

I'm afraid the only advice I can offer is to get a suitably powerful  
server.  Our typical BNCweb installation has a server with at least 4  
GB RAM and a separate dedicated MySQL server.

> Really, what do users of the BNCweb say about such long delays?

On our servers, we rarely have substantial delays, since most of the  
corpus data are already cached most of the time. On my laptop, on the  
other hand, I often experience one-minute or longer delays when I  
first start up BNCweb.

The crucial point, of course, is frequency of use.  If you have  
multiple users who are working with BNCweb all the time, the data  
files will be kept in the RAM cache.  But if no one has used it for a  
while, everything will have to be reloaded from disk, especially if  
your server has relatively little RAM.

> And what would your suggestion be regarding the user's experience?

For the user it's always better to have something like a progress bar  
-- or at least a spinning beach ball -- to watch while they wait.

> Or, are there other ways to setup CQP so that this delay is  
> eliminated?

The only possibility that comes to mind -- if the problem is really  
infrequent access to the corpus, so that data files expire from the  
RAM cache -- to run a demon process that occasionally accesses random  
parts of the corpus; so when someone uses the interface, substantial  
parts of it are already cached. That doesn't help if you offer  
multiple corpora and the data files don't ALL fit into your server's  
RAM. In this case, the only hope would be to make your queries so  
simple that they can run off an index and don't have to access the  
corpus files in the first place.

I presume your interface is intelligent enough not do download all  
matches of a query from CQP, but just finds out how many there are  
and then downloads 20 or 50 to be displayed on the first results page?

Best wishes & sorry I haven't got a better answer,
Stefan