[CWB] [cwb:feature-requests] #32 a slow CQP query...

Andrew Hardie andrewhardie at users.sf.net
Sat Jul 23 17:21:15 CEST 2016


A couple of years ago we discussed the possibilty of ad hoc optimisation for certain sorts of token-level regex query. That is, the query

[hw!="be" & hw!="have"] [pos="PNP"] [word="done"]

could be treated as follows:

(1) compile the DFA fropm regex as normal,
(2) analyse the DFA to identify one or more frequent "easy cases" that do not require the full DFA
(here, the "easy case" being a phrase with no optionality")
(3) if an Easy case is spotted apply an optimised algorithm
(here, the optimisation would be to start with the token with the lowest number of matches - the third in this example - and work from there, rather than running the DFA)

This might still be worth a try somewhere down the line. But it's certainly not any kind of priority.



---

** [feature-requests:#32] a slow CQP query...**

**Status:** open
**Group:** TODO-4.0
**Labels:** CQP 
**Created:** Tue Nov 17, 2009 12:39 AM UTC by Andrew Hardie
**Last Updated:** Wed Jul 20, 2016 11:51 AM UTC
**Owner:** nobody


The following query seemed to run rather slowly \(on the BNC: someone wanted instances of "done" used for past tense.\)

\[hw\!="be" & hw\!="have"\] \[pos="PNP"\] \[word="done"\]

My guess is that the slowness is because of the gargantuan number of matches for the first element in the string. \(Stefan might correct me on this\!\) Which leads me to wonder, is there any way of optimising this kind of thing and speeding it up? OR does this happen already?



---

Sent from sourceforge.net because cwb at sslmit.unibo.it is subscribed to https://sourceforge.net/p/cwb/feature-requests/

To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/cwb/admin/feature-requests/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20160723/23a66181/attachment.html>


More information about the CWB mailing list