[CWB] [ cwb-Bugs-2893764 ] CQPweb: number of files in query is not cached

SourceForge.net noreply at sourceforge.net
Sat Nov 7 10:53:33 CET 2009


Bugs item #2893764, was opened at 2009-11-07 09:53
Message generated for change (Tracker Item Submitted) made by andrewhardie
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=722303&aid=2893764&group_id=131809

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: CQPweb
Group: None
Status: Open
Resolution: None
Priority: 8
Private: No
Submitted By: Andrew Hardie (andrewhardie)
Assigned to: Andrew Hardie (andrewhardie)
Summary: CQPweb: number of files in query is not cached

Initial Comment:
kwic display is _painfully_ slow for large result sets, especially when there are more than 1 million matches. 

The culprit is the following line in lib/concordance.inc.php:

	/* get a list of texts with frequencies && count 'em */
	$num_of_files = count( $cqp->execute("group $qname match text_id") );

So whenever a page of query hits is displayed, you use CQP's "group"  
command to re-calculate the number of different texts containing  
matches.  This can be very expensive, so the information _must_ be  
cached somewhere in the database.

Solution: add a "number of files" field to table saved_queries, and read this instead of running CQP's group command.

That way, the group command will only be run when a new query is cached for the first time.

(This must include the creation of postprocessed (thinned) files: this can probably be generalised in concordance-post.inc.php)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=722303&aid=2893764&group_id=131809


More information about the CWB mailing list