[CWB] [ cwb-Feature Requests-2898799 ] a slow CQP query...

SourceForge.net noreply at sourceforge.net
Tue Nov 17 19:00:57 CET 2009


Feature Requests item #2898799, was opened at 2009-11-17 01:39
Message generated for change (Comment added) made by schtepf
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=722306&aid=2898799&group_id=131809

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: CQP
Group: None
Status: Open
Priority: 1
Private: No
Submitted By: Andrew Hardie (andrewhardie)
Assigned to: Nobody/Anonymous (nobody)
Summary: a slow CQP query...

Initial Comment:
The following query seemed to run rather slowly (on the BNC: someone wanted instances of "done" used for past tense.)

[hw!="be" & hw!="have"] [pos="PNP"] [word="done"]

My guess is that the slowness is because of the gargantuan number of matches for the first element in the string. (Stefan might correct me on this!) Which leads me to wonder, is there any way of optimising this kind of thing and speeding it up? OR does this happen already?


----------------------------------------------------------------------

>Comment By: Stefan Evert (schtepf)
Date: 2009-11-17 19:00

Message:
Exactly.  CQP uses a strict left-to-right matching strategy where index
lookups are only performed for the first token.  Unfortunately, CQP doesn't
have a good built-in query optimiser, and it isn't at all fun to develop
one in C. :-(

In command-line CQP, you can use a multi-step query for such cases, which
runs much faster (although [pos="PNP"] still has a lot of matches):

  A = [pos="PNP"] [word="done"];
  set A target nearest [hw = "be|have"] within left 1 word;
  delete A without target;

We thought about having special-case optimisations for this construction
in BNCweb's simple query language, but handling the multi-step queries was
just too complex to be feasible.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=722306&aid=2898799&group_id=131809


More information about the CWB mailing list