[CWB] query optimization

Serge Sharoff S.Sharoff at leeds.ac.uk
Wed Aug 9 16:31:34 CEST 2006


Not sure if Stefan is available, of course, he has much better knowledge of CWB intricacies.  But from what I learned during our May meeting in Forli, there's no possibility to correct this in the nearest future, as the query is essentially sequential, with a query converted into an FSA.  At the same because of some bug/feature in CQP the index of all occurrences of the first word in the query is computed twice, making "the" "world" a monster interms of its processing time.  A way to optimise this is by using the MU syntax:
MU(meet "the" "world" 0 1)
(32 vs. just 2.3 sec on the BNC)
S
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On
> Behalf Of Lars Nygaard
> Sent: Wednesday, August 09, 2006 2:06 PM
> To: cwb at sslmit.unibo.it
> Subject: [CWB] query optimization
> 
> Hi again,
> 
> Reading http://corpus.leeds.ac.uk/help.html reminded me that cqp does
> not seem to do query optimization based on the frequency of the various
> query tokens. For instance, in the BNC
> 
>   > "end" "of";
> 
> and
> 
>   > "the" "world";
> 
> has approximately the same number of hits, but the second is much slower
> (since it has a common word first).
> 
> Shouldn't this be a relatively easy thing to fix?
> 
> cheers,
> lars
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list