[CWB] Restricting search on text/@id
Stefan Evert
stefan.evert at uos.de
Sun May 20 20:09:14 CEST 2007
Hi Tomaz!
> could anybody tell me how to restrict search to a particular document?
> On the web I found http://www.ims.uni-stuttgart.de/projekte/
> CorpusWorkbench/CQPTutorial/html/node26.html
> but am not sure it really gives me the answer..
Yes, that's the basic idea. You need to encode the relevant document
information (I suppose it's text/@id in your case) in an appropriate
way, i.e. as an XML tag attribute:
<text id="A00">
...
</text>
In the "cwb-encode" call, this attribute should be declared with "-S
text:0+id" or a similar flag.
Then you can easily restrict a CQP search to this document:
A = ....query... :: match.text_id = "A00"
As Lars pointed out, this is not very efficient as CQP will find all
the matches first and the delete the ones that are not from the
correct document. The most efficient solution in CQP is to run a
subquery restricted to this document (it's not quite efficient as it
could be, and if you regularly need to scan a single document from a
very large corpus, there may be better solutions than CQP):
MyDoc = <text_id "A00"> [] expand to text;
MyDoc;
A = ...query...;
This subquery approach will weed out the wrong documents immediately
after looking up possible start positions in the index and before the
possibly complex query is evaluated.
Best wishes,
Stefan
--
"Ecchi nanoha ikenai to omoimasu."
stefan.evert at uos.de
purl.org/stefan.evert
More information about the CWB
mailing list