[CWB] Empty XML tags not showing in CQP

Javier Pueyo javier.pueyo at gmail.com
Tue Jun 12 21:18:33 CEST 2018


Thank you Stefan,

When I read your second suggestion I had a light-bulb moment and decided to
make "inf" an attribute of the "turn" tag (it is much better structurally
and now I can show "turn_inf"  in CQP, and hopefully visualize it in
CQPWeb):

-S turn+id+inf

CORPUS> show +turn_inf;
CORPUS> <turn> [word=".*"];

0:      <<turn_inf E>okay> ya son las horas ah

Unfortunately, I cannot apply the same solution to other empty tags I use
in my corpus (like <silence/>, <noise/>, etc), since they may appear in any
position within a given speech turn.

But being able to visualize "inf" was really important. Thanks for the clue!

Javier



2018-06-12 12:04 GMT-04:00 Stefan Evert <stefanML at collocations.de>:

>
>
> > Is there any way I can make CQP to show empty XML tags in queries
> results (so I can visualize them in CQPWeb)?
> >
> > I have tags like these:
> >
> > <inf id="E1"></inf>
> >
> > (because it seems that cwb-encode doesn't like the shorter form <inf
> id="E1"/>)
>
> That's because CWB doesn't support empty XML elements.  It accepts the
> long from with explicit start and end tags, but will simply ignore the
> empty regions.
>
> In BNCweb, we store empty XML tags in a p-attribute that collects all tags
> that occur immediately before the respective token; a second p-attribute
> contains a pre-processed (feature set) representation for easier searching.
>
> Unfortunately, CQPweb can't visualize arbitrary p-attributes yet, so this
> would help for searching but nor display.  I think the only work around
> would be to move the end tags so that the XML element wraps the following
> token:
>
> <turn>
> <inf id="E">
> okay
> </inf>
> ya
> son
> las
> horas
> ah
> </turn>
>
> and then define a suitable visualization for the start tag <inf> in
> CQPweb, but not for the end tag </inf>.
>
> Best,
> Stefan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180612/1608f11d/attachment.html>


More information about the CWB mailing list