[CWB] Display options for structural attributes

Lukas Michelbacher michells at ims.uni-stuttgart.de
Fri Sep 17 16:59:38 CEST 2010


>> I was wondering if there is an easy way to output S-attributes for each
>> match.

> as far as I know, using "show +story_num" would do exactly what
> you describe, considering the cwb_encode line you give.
> e.g.:
> no corpus]> TUEBA4 ;
> TUEBA4> show +text_id ;
> TUEBA4> "Veruntreute" ;
>        0:                           <<text_id T990507.2>Veruntreute>
> die AWO Spendengeld ? St

This is what I meant by S-attributes only being shown where they actually 
appear (in the file that was encoded with cwb-encode).  "Veruntreute" 
happens to occur at the beginning of the corpus so in your case you can 
see the id right next to "Veruntreute".  For a word that occurs 1000 words 
into the text with id T990507.2, you would need a huge context to still 
get the info.  It would, however, not be part of the match itself, only of 
the context.

What I want is different.  For every token in the corpus, I would like to have
access to all the S-attributes (i.e. XML tags and their attr/value pairs)
within which it occurs.  For every token that occurs within

<doc id="n">
.
.
.
</doc>

I'd like to have id=n connceted to that token.  The information is encoded
implicitly but to get it I'd have to use "show +doc_id" and "expand to doc"
for every query.  Or I could add another column "n" to every line before
encoding but that is what I meant by redundant information.

To cut a long story short: I want S-attribtue information that can be accessed
like P-attribute information ;).

Regards,

Lukas


More information about the CWB mailing list