[CWB] Tei header tags in CWB queries

Serge HEIDEN Slh at ens-lsh.fr
Mon Oct 23 11:32:14 CEST 2006


Hello,

Le Sunday, October 15, 2006 5:49 PM [GMT+1=CET],
Stefania Spina <sspina at unistrapg.it> a écrit :

>> is there a way with CWB to allow queries taking into
>> account the markup included in the Tei header?

Yes, there are two ways to do this in the CWB spirit,
that is from the occurrence point of view or the from the structural
point of view :

* Occurrence
Associate all the informations you want from the header to
each occurrence of the corpus. In XML-TEI parlance, this is
done by making W elements inherit specific element or
attribute values from various ancestors in the header.
An XSLT stylesheet could be the tool to do that.
When this is done, you declare the different interesting
W attributes you want to the CWB corpus indexing process.
Then, in the queries, you just express the right attribute constraints
on the occurrences you want.
See for example http://www.tekstlab.uio.no/Bosnian/Corpus.html
that uses this technique to propose an "ori" attribute in the queries
to match occurrences of only specific texts.

* Structure
Associate the informations you want from the header to a
specific XML element that dominates all the occurrences to
be queried by that information. For spoken data transcriptions,
in XML-TEI parlance this could be U elements.
When this is done, you declare the structural elements to the
CQP indexing process that hold the metadata information.
Then, in the queries, you can express constraints on the
attribute values of specific structural elements that express the
selection of material you want.
The possibility to query structural element attribute values in
queries comes from the last version of CWB. The way proposed
is rather crude : you have to express a good regular expression
on all the attribute-value pairs of structural elements - see the manual.
It is crude, but it works.

Best,

    [Serge]

_____________________________________________________________
Serge Heiden, slh at ens-lsh.fr, https://weblex.ens-lsh.fr
ENS-LSH/CNRS - ICAR UMR5191, Institut de Linguistique Française
15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883




More information about the CWB mailing list