<div dir="ltr"><div dir="ltr">On Sat, Nov 30, 2019 at 6:00 AM Stefan Evert <<a href="mailto:stefanML@collocations.de">stefanML@collocations.de</a>> wrote:<br></div><div dir="ltr"><br></div><div>Hi Stefan,</div><div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">1. I've tagged each utterance with a unique serial number that's stored in the s_utterance s-attribute. It's encoded as free text. I'd like to be able to query specific utterances by number, e.g. s_utterance="1287117", and get a single result just once -- the entirety of the utterance.</blockquote><br>
The intuitive solution is <s_utterance = "12887"> []* </s_utterance><br>A faster and slightly safer alternative (because it also works for very long sentences) is <s_utterance = "12887"> [] expand to s_utterance<br></blockquote><div><br></div><div>That's perfect! Many thanks.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">2. Performing case-sensitive queries of words is of limited use to me (and likely others). However, it's the default with CQP syntax queries. This is different from both the simple query syntax and search engine syntax, which makes it very easy to forget to add %c to every single query element. Is there any way to set searching to be case-insensitive by default?</blockquote><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">3. In a similar vein, searching across sentence/utterance boundaries is of limited usefulness, but it is also the default. This can, of course, be dealt with by adding within s to all queries, but that's a lot of typing over time, it's not intuitive to many users, and it's also easy to forget. Can queries somehow be set to not cross sentence/utterance boundaries by default? </blockquote>
<br>... 3. could in principle be handled by a Web interface such as CQPweb, which could be configured to auto-append a suitable within clause to every query (but keep in mind that only a single within clause is allowed, so this would clash with an explicit within specified by the user). CQPweb doesn't expect every corpus to have sentence units, though, so this could not be a global setting!<br></blockquote><div><br></div><div>Thanks for the detailed answers and rationales. It's good stuff to know!</div><div><br></div><div>Being able to append a user-specified (corpus admin-specified, really) "within" clause automatically would certainly be useful! The tagger I use doesn't output NPs, VPs or similar, so I can't think of any use for "within" except sentence/utterance boundaries. And in corpora in which such structures are indeed tagged, and a user specifies one with a "within" clause, this could simply replace the automatic sentence/utterance clause. That should even produce the same results, since NPs and such don't cross sentence boundaries.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">2. is a much harder problem, for several reasons:<br>
<br>
- You don't necessarily want case-insensitive matching for all attributes, e.g. POS tags might be case-sensitive, and you might want to distinguish between the lemmas "Polish" and "polish". So you'd have to tell CQP exactly which attributes default to case-insensitive.<br></blockquote><div><br></div><div>Right. If nothing else, I'd think the "word" attribute should default to being case-insensitive. There are certainly corner cases, as you point out, but mostly capitalization is linguistically uninteresting -- even with proper nouns, since any tagger worth its salt has NER and tags them as proper. Certainly I find my self adding (or mistakenly forgetting to add) "%c" to 99% of all my word-based queries. And an option such as CEQL's ":C" would allow the 1% of cases to be dealt with appropriately, of course. (Others' percentages will, of course, vary!)</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> - You'd have to give users a way of turning off case-sensitivity for individual query elements, like the :C modifier in CEQL.<br></blockquote><div><br></div><div>I have no idea what implementing something like this in CQP would involve, of course, but it almost seems like automatically adding "%c" in CQPweb, except when the user specifies some new flag for case sensitivity such as "%C", would be the easiest way to go about getting the same result.</div><div><br></div><div> Thanks again for your response.</div><div><br></div><div>Cheers,</div><div>Scott</div><div></div></div></div>