[CWB] Field Word Data (ELAN)

Hardie, Andrew a.hardie at lancaster.ac.uk
Thu Dec 8 06:50:01 CET 2016


Hi Ruprecht,

>> How did you approach the representation of these levels in the CWB format

As p-attributes for layers whose tokenisation matches the word tier. As s-attributes for those that don't.

EG, starting with a tiered example like:

Pirat-a    barb-am   hab-et
pirate-NOM beard-ACC have-3SG
"The pirate has a beard"

(in whatever underlying format...)

... I flip it horizontal -> vertical to the following CWB input file (cols separated by tabs as usual)

<s trans="The pirate has a beard">
pirat-a  pirate-NOM
barb-am  beard-ACC
hab-et   have-3SG
</s>

If there are multiple layers of glossing, then I just add more p-attributes.

In CQPweb I set the morpheme-gloss as the primary annotation, so that it can be searched like a tag in the Simple Query language (CEQL). EG _*-NOM to find all nominatives.

>> how did you end up displaying the output?

I have added a special "field mode" to CQPweb for corpora like this. It switches the concordance display to a mode which re-builds the familiar 3-line-example format.

See attached screenshot (from a small Bodo corpus).

Field mode is, alas, not as well documented in the sysadmin manual as it ought to be... and it's not fully implemented for the extended-context view.

best

Andrew.


-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Ruprecht von Waldenfels
Sent: 07 December 2016 12:27
To: Open source development of the Corpus WorkBench
Subject: [CWB] Field Word Data (ELAN)


Hi,

I was wondering about projects that use CWB to display field work data, 
i.e., text with (multiple levels of) morpheme-level glossing. Could you 
share your experiences? How did you approach the representation of these 
levels in the CWB format, how did you end up displaying the output?

I am planning to adapt our current spoken-data interface 
(parasolcorpus.org/Pushkino) to handle glossed data and will write a 
converter from the  ELAN format to handle this. I would greatly 
appreciate any comments on how this is best done, how to handle the 
display, and whether there are any projects that already do this.

Best wishes!
Ruprecht

_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://liste.sslmit.unibo.it/mailman/listinfo/cwb
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bodo.png
Type: image/png
Size: 26141 bytes
Desc: bodo.png
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20161208/4852750e/attachment-0001.png>


More information about the CWB mailing list