[CWB] Spanish TreeTagger

Hardie, Andrew a.hardie at lancaster.ac.uk
Mon May 1 07:21:57 CEST 2017


The other potentially useful thing about this way of doing it is that you can give different analyses to the MWU internally and externally if you wish.  For instance, using an English example with POS:

…
<mwu pos=”Adv”>
of       Prep
course Noun
</mwu>
…

While most users won’t realise that the mwu_pos s-attribute is there, not knowing about it doesn’t interfere with their normal use of the pos p-attribute.

best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Meier-Vieracker, Simon
Sent: 01 May 2017 05:56
To: ssadowsky at gmail.com; Open source development of the Corpus WorkBench
Subject: Re: [CWB] Spanish TreeTagger

As indicated by Andres:


<s>
...
palabra
...
<expression>
por
el
contrario
</expression>
....
palabra
....
</s>

Best, Simon



Am 01.05.2017 um 06:47 schrieb Scott Sadowsky <ssadowsky at gmail.com<mailto:ssadowsky at gmail.com>>:
On Sun, Apr 30, 2017 at 8:14 PM, Hardie, Andrew <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk>> wrote:

I personally prefer to stick to keeping multiword annotation as s-attributes with the components as separate entries in the token stream – as in the BNC World Edition, in fact – which supports users who want to know about them without creating confusion for users who don’t.

What would such an entry look like, say in VRT form? Sounds like a nice solution.

Cheers,
Scott
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20170501/96c7391c/attachment.html>


More information about the CWB mailing list