[CWB] Empty XML tags not showing in CQP

Hardie, Andrew a.hardie at lancaster.ac.uk
Tue Jun 12 23:27:22 CEST 2018


Hi Javier,

It’s worth noting that the visualisation of start & end tags in CQPweb is independent of one another. You can display just a start tag. (Or have different visualisation of the start and the end, but that’s less relevant here.)

So, you can simulate empty elements for pauses etc. using start tags (let cwb-encode fill in the end points wherever it likes, it doesn’t matter, as you won ‘t use that data) and then create a visualisation for the start tag. Stefan mentioned this but I thought it was worth underlining…

best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Javier Pueyo
Sent: 12 June 2018 20:19
To: CWBdev Mailing List <cwb at sslmit.unibo.it>
Subject: Re: [CWB] Empty XML tags not showing in CQP

Thank you Stefan,

When I read your second suggestion I had a light-bulb moment and decided to make "inf" an attribute of the "turn" tag (it is much better structurally and now I can show "turn_inf"  in CQP, and hopefully visualize it in CQPWeb):

-S turn+id+inf

CORPUS> show +turn_inf;
CORPUS> <turn> [word=".*"];

0:      <<turn_inf E>okay> ya son las horas ah

Unfortunately, I cannot apply the same solution to other empty tags I use in my corpus (like <silence/>, <noise/>, etc), since they may appear in any position within a given speech turn.

But being able to visualize "inf" was really important. Thanks for the clue!

Javier



2018-06-12 12:04 GMT-04:00 Stefan Evert <stefanML at collocations.de<mailto:stefanML at collocations.de>>:


> Is there any way I can make CQP to show empty XML tags in queries results (so I can visualize them in CQPWeb)?
>
> I have tags like these:
>
> <inf id="E1"></inf>
>
> (because it seems that cwb-encode doesn't like the shorter form <inf id="E1"/>)

That's because CWB doesn't support empty XML elements.  It accepts the long from with explicit start and end tags, but will simply ignore the empty regions.

In BNCweb, we store empty XML tags in a p-attribute that collects all tags that occur immediately before the respective token; a second p-attribute contains a pre-processed (feature set) representation for easier searching.

Unfortunately, CQPweb can't visualize arbitrary p-attributes yet, so this would help for searching but nor display.  I think the only work around would be to move the end tags so that the XML element wraps the following token:

<turn>
<inf id="E">
okay
</inf>
ya
son
las
horas
ah
</turn>

and then define a suitable visualization for the start tag <inf> in CQPweb, but not for the end tag </inf>.

Best,
Stefan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180612/4271334c/attachment-0001.html>


More information about the CWB mailing list