<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Verdana;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p.msonormal0, li.msonormal0, div.msonormal0
        {mso-style-name:msonormal;
        mso-margin-top-alt:auto;
        margin-right:0cm;
        mso-margin-bottom-alt:auto;
        margin-left:0cm;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
span.EmailStyle20
        {mso-style-type:personal-reply;
        font-family:"Verdana",sans-serif;
        color:#1F497D;
        font-weight:normal;
        font-style:normal;
        text-decoration:none none;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Hi Scott,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">What you want in terms of manipulable auto-defaults is largely what CEQL is designed to provide. One of the new types of plugin under
development is the <i>CEQL Extender</i> which will enable you to override CEQL grammar rules as per what you want. The example plugin of this sort is in fact one that adds a “<i>within s</i>” (or some other xml element) clause to everything. See <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><a href="https://sourceforge.net/p/cwb/code/HEAD/tree/gui/cqpweb/trunk/lib/plugins/builtin/CeqlExtender/AddWithinRangeOfXml.php">https://sourceforge.net/p/cwb/code/HEAD/tree/gui/cqpweb/trunk/lib/plugins/builtin/CeqlExtender/AddWithinRangeOfXml.php</a><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Also, the new implementation of case sensitivity in 3,3 will allow the sensitivity defaults to be set per attribute.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">It would be a bad idea, from a design perspective, to make the kinds of changes you suggest at the CQP level, where things like changing
the default case sensitivity or adding special treatment for “word” etc. etc. would be radical and very bad for backward-compatibility.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">best<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Andrew.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US">From:</span></b><span lang="EN-US"> cwb-bounces@sslmit.unibo.it <cwb-bounces@sslmit.unibo.it>
<b>On Behalf Of </b>Scott Sadowsky<br>
<b>Sent:</b> 30 November 2019 10:36<br>
<b>To:</b> Stefan Evert <stefanML@collocations.de><br>
<b>Cc:</b> CWBdev Mailing List <cwb@sslmit.unibo.it><br>
<b>Subject:</b> [External Sender] Re: [CWB] A few miscellaneous questions<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p><strong><span style="font-family:"Calibri",sans-serif;color:#A4343A">This email originated from outside of the University. Do not click links or open attachments unless you recognise the sender and know the content is safe.</span></strong><o:p></o:p></p>
<div>
<div>
<div>
<p class="MsoNormal">On Sat, Nov 30, 2019 at 6:00 AM Stefan Evert <<a href="mailto:stefanML@collocations.de">stefanML@collocations.de</a>> wrote:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Hi Stefan,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<p class="MsoNormal">1. I've tagged each utterance with a unique serial number that's stored in the s_utterance s-attribute. It's encoded as free text. I'd like to be able to query specific utterances by number, e.g. s_utterance="1287117", and get a single
result just once -- the entirety of the utterance.<o:p></o:p></p>
</blockquote>
<p class="MsoNormal"><br>
The intuitive solution is <s_utterance = "12887"> []* </s_utterance><br>
A faster and slightly safer alternative (because it also works for very long sentences) is <s_utterance = "12887"> [] expand to s_utterance<o:p></o:p></p>
</blockquote>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">That's perfect! Many thanks.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<p class="MsoNormal">2. Performing case-sensitive queries of words is of limited use to me (and likely others). However, it's the default with CQP syntax queries. This is different from both the simple query syntax and search engine syntax, which makes it very
easy to forget to add %c to every single query element. Is there any way to set searching to be case-insensitive by default?<o:p></o:p></p>
</blockquote>
<p class="MsoNormal"><o:p> </o:p></p>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<p class="MsoNormal">3. In a similar vein, searching across sentence/utterance boundaries is of limited usefulness, but it is also the default. This can, of course, be dealt with by adding within s to all queries, but that's a lot of typing over time, it's
not intuitive to many users, and it's also easy to forget. Can queries somehow be set to not cross sentence/utterance boundaries by default? <o:p></o:p></p>
</blockquote>
<p class="MsoNormal"><br>
... 3. could in principle be handled by a Web interface such as CQPweb, which could be configured to auto-append a suitable within clause to every query (but keep in mind that only a single within clause is allowed, so this would clash with an explicit within
specified by the user). CQPweb doesn't expect every corpus to have sentence units, though, so this could not be a global setting!<o:p></o:p></p>
</blockquote>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Thanks for the detailed answers and rationales. It's good stuff to know!<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Being able to append a user-specified (corpus admin-specified, really) "within" clause automatically would certainly be useful! The tagger I use doesn't output NPs, VPs or similar, so I can't think of any use for "within" except sentence/utterance
boundaries. And in corpora in which such structures are indeed tagged, and a user specifies one with a "within" clause, this could simply replace the automatic sentence/utterance clause. That should even produce the same results, since NPs and such don't cross
sentence boundaries.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<p class="MsoNormal">2. is a much harder problem, for several reasons:<br>
<br>
- You don't necessarily want case-insensitive matching for all attributes, e.g. POS tags might be case-sensitive, and you might want to distinguish between the lemmas "Polish" and "polish". So you'd have to tell CQP exactly which attributes default to case-insensitive.<o:p></o:p></p>
</blockquote>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Right. If nothing else, I'd think the "word" attribute should default to being case-insensitive. There are certainly corner cases, as you point out, but mostly capitalization is linguistically uninteresting -- even with proper nouns, since
any tagger worth its salt has NER and tags them as proper. Certainly I find my self adding (or mistakenly forgetting to add) "%c" to 99% of all my word-based queries. And an option such as CEQL's ":C" would allow the 1% of cases to be dealt with appropriately,
of course. (Others' percentages will, of course, vary!)<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<p class="MsoNormal"> - You'd have to give users a way of turning off case-sensitivity for individual query elements, like the :C modifier in CEQL.<o:p></o:p></p>
</blockquote>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I have no idea what implementing something like this in CQP would involve, of course, but it almost seems like automatically adding "%c" in CQPweb, except when the user specifies some new flag for case sensitivity such as "%C", would be
the easiest way to go about getting the same result.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"> Thanks again for your response.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Cheers,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Scott<o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</body>
</html>