[CWB] Field Word Data (ELAN)

Ruprecht von Waldenfels ruprecht.waldenfels at gmx.net
Mon Dec 12 13:59:05 CET 2016


Stefan, thanks,
I thought of this, too, but it would be a hack and would complicate CQP 
queries. So this is where the CWB data model reaches its limits...
Long live the Ziggurat data model!
Best wishes,
Ruprecht

Am 12.12.2016 um 11:17 schrieb Stefan Evert:
> Depending on how you want to use the corpus, it might also make sense to split the text into morphemes as tokens and use an s-attribute to identify complete words.  This will be unnatural if you write CQP queries directly and it wouldn't play well with CQPweb's sorting and collocation functions, but if you design your own Web interface, much of the complexity can be hidden.
>
> In your example, this encoding would look as follows
>
> <s trans="The pirate has a beard">
> <w orth="pirat-a">
> pirat	pirate	NOUN
> a	NOM	NOM
> </w>
> <w orth="barb-am">
> barb	beard	NOUN
> am	ACC	ACC
> </w>
> <w orth="hab-et">
> hab	have	VERB
> et	3SG	3SG
> </w>
> </s>
>
> In theory, the Ziggurat data model can deal with such multiple levels of tokenization much more naturally, but we don't envisage support at the CQP / CQPweb level (which would fundamentally change assumptions made by these tools).
>
> Best,
> Stefan
>
>
>> On 10 Dec 2016, at 11:00, Ruprecht von Waldenfels <ruprecht.waldenfels at gmx.net> wrote:
>>
>> I wonder how to deal with multiple lines of glossing that are dependent on each other, e.g.,
>>
>> Pirat-a    barb-am   hab-etpirate-NOM beard-ACC have-3SGNOUN-NOM NOUN-ACC VERB-3SG"The pirate has a beard"
>> This is a silly example, of course, but it highlights the problem: in an id eal world, I would like to be able to query for word forms that involve a morpheme with the NOUN 'pirate', i.e., utilizes the alignment within the glosses. This could be done by adding a further p-attribute that offers a set, e..,
>>
>> <s trans="The pirate has a beard">pirat-a  pirate-NOM 3SG NOUN-NOM |pirat:pirate:NOUN|a:NOM:NOM|barb-am  beard-ACC NOUN-ACC	|barb:beard:NOUN|am:ACC:ACC|hab-et   have-3SG VERB-3SG      |hab:have:verb|et:3SG:3SG|</s>
>> This would allow me to easily search for, say, a morpheme 'et' that is a third person singular marker without having to specify its position in the glossed word form. I realize the third level is not very functional here, but it stands for the (real possibility) of multiple glosses that relate to each other.
>>
>> Any of these solutions is not very elegant, it seems to me - they merely succeed in making searches possible; but I cannot think of any better way.
>>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb




More information about the CWB mailing list