<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hi Igor, <br>
<br>
I have integrated this kind of annotation into CWB for the ParaSol
corpus (parasol.unibe.ch). The solution I used is similar and
straight forward; the main challenge, I think, is providing (a)
for an unspecified number of analyses and (b) making sure that the
different analyses don't get mixed up <br>
<br>
For example, Russian "dam" EITHER 1.SG of the verb <i>dat'</i>
'give' OR GEN.PL of the noun <i>dama</i> 'lady'; but it is not
the first person singular of the noun nor the Genitive Plural of
the verb. <br>
<br>
Therefore, I feel one must go for a rather complex annotation,
which can then be queried by using a regular expression. The
machinery is<br>
<br>
FORM ANNOTATION<br>
dam 1:SG:PF-dat::GEN:PL-dama-<br>
<br>
and then you can query for, say, Genitive by searching for <br>
<br>
[annot=".*:GEN:.*"]<br>
<br>
for <i>dama</i> 'lady'<br>
<br>
[annot=".*-dama-.*"]<br>
<br>
for the combination (genitives of <i>dama</i>)<br>
<br>
[annot=".*GEN[^-]*-dama-.*"]<br>
<br>
The "[^-]*" part will ensure that the GEN part does not belong to
a different lemma. <br>
<br>
I believe this might constitute a complete solution, and it should
be possible to hide the complexity from the user by wrapping this
in a more convenient interface. <br>
<br>
Best, <br>
Ruprecht<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
Am 10.07.2012 15:43, schrieb Serge Heiden:<br>
</div>
<blockquote cite="mid:4FFC316E.8080703@ens-lyon.fr" type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
Hi Igor,<br>
<br>
One way to do this in CWB would be to split your<br>
pos and lemma values in several positionnal attributes.<br>
For example, in this way :<br>
form lemma1 lemma2 pos1 pos2 agr_set1
agr_set2 sem_set1 sem_set2<br>
<br>
And force your queries to work coherently with<br>
corresponding attribute sets.<br>
Your example query would become :<br>
<span style="white-space: pre;">[lemma1=".*valuelemma.*" &
pos1=".*valuepos.*"]</span><br>
<br>
What do you think ?<br>
<br>
Best,<br>
Serge<br>
<br>
<br>
le 10/07/2012 15:20 Selon Игорь Шалыминов:<br>
<span style="white-space: pre;">> Hello!<br>
> <br>
> My name is Igor, I'm a developer of Russian National Corpus
search<br>
> engine, and I'm trying to get it working with CWB. The main
problem I<br>
> have is the following: RNC texts are annotated ambiguously
for the<br>
> most part, and each word has got sets of lemmas, grammar
and semantic<br>
> features, just as the GERMAN-LAW example in the tutorial.
Suppose we<br>
> have a word:<br>
> <br>
> word lemma pos agr<br>
> sem <br>
>
------------------------------------------------------------------------------------------------------------------------<br>
><br>
> </span><br>
form |lemma1|lemma2| |pos1|pos2| |agr_set1|agr_set2|
|sem_set1|sem_set2|<br>
<span style="white-space: pre;">> <br>
> And, if I type the query:<br>
> <br>
> [(lemma contains "lemma1") and (pos contains "pos2")]<br>
> <br>
> I will get that very word matched, and this will be a
mistake in my<br>
> case since there is only one strict correspondence: "lemma1
-> pos1<br>
> -> arg_set1 -> sem_set1", and the same for lemma2.<br>
> <br>
> So, my question, is there an out of the box possibility of
performing<br>
> such queries (i.e., controlling positions of corresponding
sets while<br>
> matching attribute sets with 'contains'), or it has to be<br>
> implemented?<br>
> <br>
> -- Best Regards, Igor Shalyminov <br>
> _______________________________________________ CWB mailing
list <br>
> <a moz-do-not-send="true" class="moz-txt-link-abbreviated"
href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a> <br>
> <a moz-do-not-send="true" class="moz-txt-link-freetext"
href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a></span><br>
<br>
-- <br>
Dr. Serge Heiden, <a moz-do-not-send="true"
class="moz-txt-link-abbreviated" href="mailto:slh@ens-lyon.fr">slh@ens-lyon.fr</a>,
<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="http://textometrie.ens-lyon.fr">http://textometrie.ens-lyon.fr</a><br>
ENS de Lyon/CNRS - ICAR UMR5191, Institut de Linguistique
Française<br>
15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél.
+33(0)622003883<br>
<br>
<br>
<br>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
CWB mailing list
<a class="moz-txt-link-abbreviated" href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a>
<a class="moz-txt-link-freetext" href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a>
</pre>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
------------------------------------------------
Ruprecht v. Waldenfels, <a class="moz-txt-link-abbreviated" href="mailto:waldenfels@issl.unibe.ch">waldenfels@issl.unibe.ch</a>
Institut fuer slavische Sprachen und Literaturen
Universität Bern Laenggassstr. 49 CH 3005 Bern 9
Tel: +41 31 631 35 83 / Fax: +41 31 631 39 90
------------------------------------------------
</pre>
</body>
</html>