<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Hi Igor, <br>

      <br>

      I have integrated this kind of annotation into CWB for the ParaSol

      corpus (parasol.unibe.ch). The solution I used is similar and

      straight forward; the main challenge, I think, is providing (a)

      for an unspecified number of analyses and (b) making sure that the

      different analyses don't get mixed up <br>

      <br>

      For example, Russian "dam" EITHER&nbsp; 1.SG of the verb <i>dat'</i>

      'give' OR GEN.PL of the noun <i>dama</i> 'lady'; but it is not

      the first person singular of the noun nor the Genitive Plural of

      the verb. <br>

      <br>

      Therefore, I feel one must go for a rather complex annotation,

      which can then be queried by using a regular expression. The

      machinery is<br>

      <br>

      FORM ANNOTATION<br>

      dam&nbsp; 1:SG:PF-dat::GEN:PL-dama-<br>

      <br>

      and then you can query for, say, Genitive by searching for <br>

      <br>

      [annot=".*:GEN:.*"]<br>

      <br>

      for <i>dama</i> 'lady'<br>

      <br>

      [annot=".*-dama-.*"]<br>

      <br>

      for the&nbsp; combination (genitives of <i>dama</i>)<br>

      <br>

      [annot=".*GEN[^-]*-dama-.*"]<br>

      <br>

      The "[^-]*" part will ensure that the GEN part does not belong to

      a different lemma. <br>

      <br>

      I believe this might constitute a complete solution, and it should

      be possible to hide the complexity from the user by wrapping this

      in a more convenient interface. <br>

      <br>

      Best, <br>

      Ruprecht<br>

      <br>

      <br>

      <br>

      <br>

      <br>

      <br>

      <br>

      <br>

      <br>

      <br>

      <br>

      Am 10.07.2012 15:43, schrieb Serge Heiden:<br>

    </div>

    <blockquote cite="mid:4FFC316E.8080703@ens-lyon.fr" type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=ISO-8859-1">

      Hi Igor,<br>

      <br>

      One way to do this in CWB would be to split your<br>

      pos and lemma values in several positionnal attributes.<br>

      For example, in this way :<br>

      form&nbsp;&nbsp;&nbsp; lemma1&nbsp;&nbsp;&nbsp; lemma2&nbsp;&nbsp;&nbsp; pos1&nbsp;&nbsp;&nbsp; pos2&nbsp;&nbsp;&nbsp; agr_set1&nbsp;&nbsp;&nbsp;

      agr_set2&nbsp;&nbsp;&nbsp; sem_set1&nbsp;&nbsp;&nbsp; sem_set2<br>

      <br>

      And force your queries to work coherently with<br>

      corresponding attribute sets.<br>

      Your example query would become :<br>

      <span style="white-space: pre;">[lemma1=".*valuelemma.*" &amp;

        pos1=".*valuepos.*"]</span><br>

      <br>

      What do you think ?<br>

      <br>

      Best,<br>

      Serge<br>

      <br>

      <br>

      le 10/07/2012 15:20 Selon &#1048;&#1075;&#1086;&#1088;&#1100; &#1064;&#1072;&#1083;&#1099;&#1084;&#1080;&#1085;&#1086;&#1074;:<br>

      <span style="white-space: pre;">&gt; Hello!<br>

        &gt; <br>

        &gt; My name is Igor, I'm a developer of Russian National Corpus

        search<br>

        &gt; engine, and I'm trying to get it working with CWB. The main

        problem I<br>

        &gt; have is the following: RNC texts are annotated ambiguously

        for the<br>

        &gt; most part, and each word has got sets of lemmas, grammar

        and semantic<br>

        &gt; features, just as the GERMAN-LAW example in the tutorial.

        Suppose we<br>

        &gt; have a word:<br>

        &gt; <br>

        &gt; word lemma pos agr<br>

        &gt; sem <br>

        &gt;

------------------------------------------------------------------------------------------------------------------------<br>

        &gt;<br>

        &gt; </span><br>

      form&nbsp;&nbsp;&nbsp; |lemma1|lemma2|&nbsp;&nbsp;&nbsp; |pos1|pos2|&nbsp;&nbsp;&nbsp; |agr_set1|agr_set2|&nbsp;&nbsp;&nbsp;

      |sem_set1|sem_set2|<br>

      <span style="white-space: pre;">&gt; <br>

        &gt; And, if I type the query:<br>

        &gt; <br>

        &gt; [(lemma contains "lemma1") and (pos contains "pos2")]<br>

        &gt; <br>

        &gt; I will get that very word matched, and this will be a

        mistake in my<br>

        &gt; case since there is only one strict correspondence: "lemma1

        -&gt; pos1<br>

        &gt; -&gt; arg_set1 -&gt; sem_set1", and the same for lemma2.<br>

        &gt; <br>

        &gt; So, my question, is there an out of the box possibility of

        performing<br>

        &gt; such queries (i.e., controlling positions of corresponding

        sets while<br>

        &gt; matching attribute sets with 'contains'), or it has to be<br>

        &gt; implemented?<br>

        &gt; <br>

        &gt; -- Best Regards, Igor Shalyminov <br>

        &gt; _______________________________________________ CWB mailing

        list <br>

        &gt; <a moz-do-not-send="true" class="moz-txt-link-abbreviated"

          href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a> <br>

        &gt; <a moz-do-not-send="true" class="moz-txt-link-freetext"

          href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a></span><br>

      <br>

      -- <br>

      Dr. Serge Heiden, <a moz-do-not-send="true"

        class="moz-txt-link-abbreviated" href="mailto:slh@ens-lyon.fr">slh@ens-lyon.fr</a>,

      <a moz-do-not-send="true" class="moz-txt-link-freetext"

        href="http://textometrie.ens-lyon.fr">http://textometrie.ens-lyon.fr</a><br>

      ENS de Lyon/CNRS - ICAR UMR5191, Institut de Linguistique

      Fran&ccedil;aise<br>

      15, parvis Ren&eacute; Descartes 69342 Lyon BP7000 Cedex, t&eacute;l.

      +33(0)622003883<br>

      <br>

      <br>

      <br>

      <br>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

CWB mailing list

<a class="moz-txt-link-abbreviated" href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a>

<a class="moz-txt-link-freetext" href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a>

</pre>

    </blockquote>

    <br>

    <br>

    <pre class="moz-signature" cols="72">-- 

------------------------------------------------

Ruprecht v. Waldenfels, <a class="moz-txt-link-abbreviated" href="mailto:waldenfels@issl.unibe.ch">waldenfels@issl.unibe.ch</a>

Institut fuer slavische Sprachen und Literaturen

Universit&auml;t Bern Laenggassstr. 49 CH 3005 Bern 9

Tel: +41  31 631 35 83 /  Fax: +41 31  631 39 90

------------------------------------------------

</pre>

  </body>

</html>