<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Verdana;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Aptos;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        font-size:12.0pt;
        font-family:"Aptos",sans-serif;}
span.EmailStyle18
        {mso-style-type:personal-reply;
        font-family:"Verdana",sans-serif;
        color:#156082;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;
        mso-ligatures:none;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="#467886" vlink="#96607D" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">Hi Graham<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">This isn’t a restriction on the lemma format. It’s simply that CQP doesn’t, by default, understand things like | as meaning an alternative
 in its input data.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">Thus, what gets indexed is the string “</span>eau|eaux<span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">”
 – so that’s what you have to search for.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span lang="FR" style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">In CQL<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="FR" style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal" style="text-indent:36.0pt"><span lang="FR" style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">[pos=&quot;eau\|eaux&quot;]<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="FR" style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">Note that the pipe has to be escaped because you are
<i>searching for</i> the pipe, not separating queriable alternatives.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">In CEQL<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">{eau\|eaux}<o:p></o:p></span></p>
<p class="MsoNormal" dir="RTL" style="text-align:right;direction:rtl;unicode-bidi:embed">
<span dir="LTR" style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">Escape is for the same reason. Or, more concisely for this specific example:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">[pos=&quot;eaux?&quot;]<o:p></o:p></span></p>
<p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">{eau[x,]}<o:p></o:p></span></p>
<p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">(or else just use a bunch of * at the start and end of every lemma query, though that will probably lose you precision in the query
 results)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">HOWEVER, there is a way to get the lemma field to behave like I think you expect it to (though you would need to recode to add leading
 and trailing pipes to each lemma value), which is to create the p-attribute as a feature set. See encoding manual
<b>Sec 6</b>, and CQP manual <b>Sec 6.6</b>. Note that the special CQP functions for feature sets aren’t accessible via CEQL.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">Hope that helps<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">Best<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">Andrew.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm;font-size:pt">
<p class="MsoNormal"><b><span style="font-family:&quot;Calibri&quot;,sans-serif">From:</span></b><span style="font-family:&quot;Calibri&quot;,sans-serif"> cwb-bounces@sslmit.unibo.it &lt;cwb-bounces@sslmit.unibo.it&gt;
<b>On Behalf Of </b>Graham Ranger -- UAPV<br>
<b>Sent:</b> 31 May 2025 10:43<br>
<b>To:</b> Open source development of the Corpus WorkBench &lt;cwb@sslmit.unibo.it&gt;<br>
<b>Subject:</b> [CWB] Restrictions on lemma annotation<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<div>
<p class="MsoNormal">Hello,<br>
In a corpus I'm setting up, using treetagger with a parameter file for classical French, there are a number of alternative lemmata, i.e. things like:<br>
eau&nbsp;&nbsp;&nbsp; Nc&nbsp;&nbsp;&nbsp; eau|eaux [Nc: common noun]<br>
I'm not entirely sure why, since there is no ambiguity here, but as a result it is impossible to search for the lemma &quot;eau&quot;.<br>
Are there any solutions to other than simply opting to remove the pipe and what comes after it from column three of the vrt file to allow querying only for the first choice of lemma?<br>
Many thanks in advance.<br>
Graham.<o:p></o:p></p>
</div>
</div>
</body>
</html>