[CWB] [ cwb-Bugs-2893764 ] CQPweb: number of files in query is not cached

SourceForge.net noreply at sourceforge.net
Sat Nov 28 10:43:55 CET 2009


Bugs item #2893764, was opened at 2009-11-07 09:53
Message generated for change (Comment added) made by andrewhardie
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=722303&aid=2893764&group_id=131809

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: CQPweb
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 8
Private: No
Submitted By: Andrew Hardie (andrewhardie)
Assigned to: Andrew Hardie (andrewhardie)
Summary: CQPweb: number of files in query is not cached

Initial Comment:
kwic display is _painfully_ slow for large result sets, especially when there are more than 1 million matches. 

The culprit is the following line in lib/concordance.inc.php:

	/* get a list of texts with frequencies && count 'em */
	$num_of_files = count( $cqp->execute("group $qname match text_id") );

So whenever a page of query hits is displayed, you use CQP's "group"  
command to re-calculate the number of different texts containing  
matches.  This can be very expensive, so the information _must_ be  
cached somewhere in the database.

Solution: add a "number of files" field to table saved_queries, and read this instead of running CQP's group command.

That way, the group command will only be run when a new query is cached for the first time.

(This must include the creation of postprocessed (thinned) files: this can probably be generalised in concordance-post.inc.php)

----------------------------------------------------------------------

>Comment By: Andrew Hardie (andrewhardie)
Date: 2009-11-28 09:43

Message:
Fixed in 2.08.

It turns out that, following BNCweb, it is the no of files in the ORIGINAL
(not the postprocessed) query that must be recorded and rendered in the
solution heading.

So, this is calculated once when the query is originally cached - and then
never touched.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=722303&aid=2893764&group_id=131809


More information about the CWB mailing list