[CWB] Display options for structural attributes
Lukas Michelbacher
michells at ims.uni-stuttgart.de
Fri Sep 17 16:59:38 CEST 2010
>> I was wondering if there is an easy way to output S-attributes for each
>> match.
> as far as I know, using "show +story_num" would do exactly what
> you describe, considering the cwb_encode line you give.
> e.g.:
> no corpus]> TUEBA4 ;
> TUEBA4> show +text_id ;
> TUEBA4> "Veruntreute" ;
> 0: <<text_id T990507.2>Veruntreute>
> die AWO Spendengeld ? St
This is what I meant by S-attributes only being shown where they actually
appear (in the file that was encoded with cwb-encode). "Veruntreute"
happens to occur at the beginning of the corpus so in your case you can
see the id right next to "Veruntreute". For a word that occurs 1000 words
into the text with id T990507.2, you would need a huge context to still
get the info. It would, however, not be part of the match itself, only of
the context.
What I want is different. For every token in the corpus, I would like to have
access to all the S-attributes (i.e. XML tags and their attr/value pairs)
within which it occurs. For every token that occurs within
<doc id="n">
.
.
.
</doc>
I'd like to have id=n connceted to that token. The information is encoded
implicitly but to get it I'd have to use "show +doc_id" and "expand to doc"
for every query. Or I could add another column "n" to every line before
encoding but that is what I meant by redundant information.
To cut a long story short: I want S-attribtue information that can be accessed
like P-attribute information ;).
Regards,
Lukas
More information about the CWB
mailing list