[CWB] Display options for structural attributes
Yannick Versley
yversley at gmail.com
Fri Sep 17 15:11:39 CEST 2010
Lukas,
as far as I know, using "show +story_num" would do exactly what
you describe, considering the cwb_encode line you give.
e.g.:
no corpus]> TUEBA4 ;
TUEBA4> show +text_id ;
TUEBA4> "Veruntreute" ;
0: <<text_id T990507.2>Veruntreute>
die AWO Spendengeld ? St
(Maybe I misunderstood you and you meant the kind of postprocessing where you
would need half a dozen lines of Perl and CWB::CL. Sorry in that case.)
Best,
Yannick
On Fri, Sep 17, 2010 at 2:37 PM, Lukas Michelbacher
<michells at ims.uni-stuttgart.de> wrote:
> Hello,
>
> I was wondering if there is an easy way to output S-attributes for each
> match.
>
> As far as I know [1], when you display S-attributes, they are displayed in
> the position in which they actually appear in the corpus [2].
>
> I'd like to be able to say something like "show +story:num" and then get the
> value
> of the num attribute of the story tag for each hit. This could be useful
> for
> computing tf-idf weights, for example. E.g. the query
>
>> "A"
>
> would yield the result
>
> 2: A/DT/a/1
> 11: A/DT/a/2
>
> Otherwise, I'd have to encode the story number as a P-attribute for each
> token, which would store redundant information and require more annoying
> preprocessing ;).
>
> Regards,
>
> Lukas
>
> --
> Dipl.-Ling. Lukas Michelbacher
> Institute for Natural Language Processing
> University of Stuttgart
>
> phone: +49 (0)711-685-84587
> fax : +49 (0)711-685-81366
> email: michells at ims.uni-stuttgart.de
>
> [1]
>
> my knowledge is based on
> http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CWBTutorial/cwb-tutorial.pdf
>
> [2]
>
> This is my example corpus:
>
> <!-- A Thrilling Experience -->
> <story num="1" title="A Thrilling Experience">
> <p>
> <s>
> Tick NN tick
> . SENT .
> </s>
> <s>
> A DT a
> clock NN clock
> . SENT .
> </s>
> <s>
> Tick VB tick
> , , ,
> tick VB tick
> . SENT .
> </s>
> </p>
> </story>
>
> <story num="2" title="A Thrilling Experience 2">
> <p>
> <s>
> Tock NN tock
> . SENT .
> </s>
> <s>
> A DT a
> click NN click
> . SENT .
> </s>
> <s>
> Tock VB tock
> , , ,
> tock VB tock
> . SENT .
> </s>
> </p>
> </story>
>
> I encoded it with CWB-2.2.b99-RC1 using the following options:
>
> -D -B -s -x -s -P pos -P lemma -S s:0 -S p:0 -V story:0+num+title
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>
More information about the CWB
mailing list