[CWB] Restricting search on text/@id

Stefan Evert stefan.evert at uos.de
Sun May 20 20:09:14 CEST 2007


Hi Tomaz!

> could anybody tell me how to restrict search to a particular document?
> On the web I found http://www.ims.uni-stuttgart.de/projekte/ 
> CorpusWorkbench/CQPTutorial/html/node26.html
> but am not sure it really gives me the answer..

Yes, that's the basic idea.  You need to encode the relevant document  
information (I suppose it's text/@id in your case) in an appropriate  
way, i.e. as an XML tag attribute:

<text id="A00">
...
</text>

In the "cwb-encode" call, this attribute should be declared with "-S  
text:0+id" or a similar flag.

Then you can easily restrict a CQP search to this document:

   A = ....query... :: match.text_id = "A00"

As Lars pointed out, this is not very efficient as CQP will find all  
the matches first and the delete the ones that are not from the  
correct document.  The most efficient solution in CQP is to run a  
subquery restricted to this document (it's not quite efficient as it  
could be, and if you regularly need to scan a single document from a  
very large corpus, there may be better solutions than CQP):

   MyDoc = <text_id "A00"> [] expand to text;
   MyDoc;
   A = ...query...;

This subquery approach will weed out the wrong documents immediately  
after looking up possible start positions in the index and before the  
possibly complex query is evaluated.

Best wishes,
Stefan


--
"Ecchi nanoha ikenai to omoimasu."
stefan.evert at uos.de
purl.org/stefan.evert




More information about the CWB mailing list