<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></head><body dir="auto"><div dir="auto">Hi Andrew,&nbsp;</div><div dir="auto">Many thanks for this, which is extremely helpful. I was indeed not escaping the pipe in the query.&nbsp;</div><div dir="auto">First step, then, will be for me to generate a list of these alternative lemmata, for users, and to provide indications on how to formulate queries in this specific case.</div><div dir="auto">I'll look into option two, but the platform is really addressed to cqpweb users for whom I'd like to keep queries as simple as possible.</div><div dir="auto">Best,&nbsp;</div><div dir="auto">Graham.</div><div dir="auto"><br></div><div dir="auto"><br></div><div id="composer_signature" dir="auto"><div style="font-size:14px;color:#909090" dir="auto">Envoyé depuis mon appareil Galaxy</div></div><div dir="auto"><br></div><div><br></div><div align="left" dir="auto" style="font-size:100%;color:#000000"><div>-------- Message d'origine --------</div><div>De : "Hardie, Andrew" &lt;a.hardie@lancaster.ac.uk&gt; </div><div>Date : 02/06/2025  10:59  (GMT+01:00) </div><div>À : Open source development of the Corpus WorkBench &lt;cwb@sslmit.unibo.it&gt; </div><div>Objet : Re: [CWB] Restrictions on lemma annotation </div><div><br></div></div>
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">Hi Graham</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">This isn’t a restriction on the lemma format. It’s simply that CQP doesn’t, by default, understand things like | as meaning an alternative
 in its input data.</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">Thus, what gets indexed is the string “</span>eau|eaux<span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">”
 – so that’s what you have to search for.</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US" lang="FR">In CQL</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US" lang="FR">&nbsp;</span></p>
<p style="text-indent:36.0pt" class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US" lang="FR">[pos="eau\|eaux"]</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US" lang="FR">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">Note that the pipe has to be escaped because you are
<i>searching for</i> the pipe, not separating queriable alternatives.</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">In CEQL</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">&nbsp;</span></p>
<p style="text-indent:36.0pt" class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">{eau\|eaux}</span></p>
<p style="text-align:right;direction:rtl;unicode-bidi:embed" dir="RTL" class="MsoNormal">
<span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US" dir="LTR">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">Escape is for the same reason. Or, more concisely for this specific example:</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">&nbsp;</span></p>
<p style="text-indent:36.0pt" class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">[pos="eaux?"]</span></p>
<p style="text-indent:36.0pt" class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">&nbsp;</span></p>
<p style="text-indent:36.0pt" class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">{eau[x,]}</span></p>
<p style="text-indent:36.0pt" class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">(or else just use a bunch of * at the start and end of every lemma query, though that will probably lose you precision in the query
 results)</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">HOWEVER, there is a way to get the lemma field to behave like I think you expect it to (though you would need to recode to add leading
 and trailing pipes to each lemma value), which is to create the p-attribute as a feature set. See encoding manual
<b>Sec 6</b>, and CQP manual <b>Sec 6.6</b>. Note that the special CQP functions for feature sets aren’t accessible via CEQL.
</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">Hope that helps</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">Best</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">Andrew.</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#156082;mso-fareast-language:EN-US">&nbsp;</span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm;font-size:pt">
<p class="MsoNormal"><b><span style="font-family:&quot;Calibri&quot;,sans-serif">From:</span></b><span style="font-family:&quot;Calibri&quot;,sans-serif"> cwb-bounces@sslmit.unibo.it &lt;cwb-bounces@sslmit.unibo.it&gt;
<b>On Behalf Of </b>Graham Ranger -- UAPV<br>
<b>Sent:</b> 31 May 2025 10:43<br>
<b>To:</b> Open source development of the Corpus WorkBench &lt;cwb@sslmit.unibo.it&gt;<br>
<b>Subject:</b> [CWB] Restrictions on lemma annotation</span></p>
</div>
</div>
<p class="MsoNormal">&nbsp;</p>
<div>
<p class="MsoNormal">Hello,<br>
In a corpus I'm setting up, using treetagger with a parameter file for classical French, there are a number of alternative lemmata, i.e. things like:<br>
eau&nbsp;&nbsp;&nbsp; Nc&nbsp;&nbsp;&nbsp; eau|eaux [Nc: common noun]<br>
I'm not entirely sure why, since there is no ambiguity here, but as a result it is impossible to search for the lemma "eau".<br>
Are there any solutions to other than simply opting to remove the pipe and what comes after it from column three of the vrt file to allow querying only for the first choice of lemma?<br>
Many thanks in advance.<br>
Graham.</p>
</div>
</div>


</body></html>