[CWB] Display options for structural attributes

Yannick Versley yversley at gmail.com
Fri Sep 17 15:11:39 CEST 2010


Lukas,

as far as I know, using "show +story_num" would do exactly what
you describe, considering the cwb_encode line you give.
e.g.:
no corpus]> TUEBA4 ;
TUEBA4> show +text_id ;
TUEBA4> "Veruntreute" ;
        0:                           <<text_id T990507.2>Veruntreute>
die AWO Spendengeld ? St

(Maybe I misunderstood you and you meant the kind of postprocessing where you
would need half a dozen lines of Perl and CWB::CL. Sorry in that case.)

Best,
Yannick

On Fri, Sep 17, 2010 at 2:37 PM, Lukas Michelbacher
<michells at ims.uni-stuttgart.de> wrote:
> Hello,
>
> I was wondering if there is an easy way to output S-attributes for each
> match.
>
> As far as I know [1],  when you display S-attributes, they are displayed in
> the position in which they actually appear in the corpus [2].
>
> I'd like to be able to say something like "show +story:num" and then get the
> value
> of the num attribute of the story tag for each hit.  This could be useful
> for
> computing tf-idf weights, for example.  E.g. the query
>
>> "A"
>
> would yield the result
>
> 2: A/DT/a/1
> 11: A/DT/a/2
>
> Otherwise, I'd have to encode the story number as a P-attribute for each
> token, which would store redundant information and require more annoying
> preprocessing ;).
>
> Regards,
>
> Lukas
>
> --
> Dipl.-Ling. Lukas Michelbacher
> Institute for Natural Language Processing
> University of Stuttgart
>
> phone: +49 (0)711-685-84587
> fax  : +49 (0)711-685-81366
> email: michells at ims.uni-stuttgart.de
>
> [1]
>
> my knowledge is based on
> http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CWBTutorial/cwb-tutorial.pdf
>
> [2]
>
> This is my example corpus:
>
> <!-- A Thrilling Experience -->
> <story num="1" title="A Thrilling Experience">
> <p>
> <s>
> Tick    NN      tick
> .       SENT    .
> </s>
> <s>
> A       DT      a
> clock   NN      clock
> .       SENT    .
> </s>
> <s>
> Tick    VB      tick
> ,       ,       ,
> tick    VB      tick
> .       SENT    .
> </s>
> </p>
> </story>
>
> <story num="2" title="A Thrilling Experience 2">
> <p>
> <s>
> Tock    NN      tock
> .       SENT    .
> </s>
> <s>
> A       DT      a
> click   NN      click
> .       SENT    .
> </s>
> <s>
> Tock    VB      tock
> ,       ,       ,
> tock    VB      tock
> .       SENT    .
> </s>
> </p>
> </story>
>
> I encoded it with CWB-2.2.b99-RC1 using the following options:
>
> -D -B -s -x -s -P pos -P lemma -S s:0 -S p:0 -V story:0+num+title
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>


More information about the CWB mailing list