[CWB] query efficiency issue

Hardie, Andrew a.hardie at lancaster.ac.uk
Fri Sep 27 19:08:19 CEST 2024


Hi Chelsey,

Apologies, I overlooked this one because it wasn't under the same subject line (perils of replying to a list digest I'm afraid.)

If you are getting    "ora[i,]n" %cd   in Query history, that looks very much like a simple query parser misconfiguration. You will need to contact the server administrator at Glasgow (I'm unaware who that is) to resolve the issue.

For the record, if you use the following in simple query / ignore case

            ora[i,]n

then that is equivalent to the following in CQP syntax

          "orai?n"%cd

Therefore, if simple query is not working, you can switch to CQP syntax and use the above query string to get accented and unaccented forms with or without the I.

FYI: as you suspected, the %cd means case-insensitive, diacritic-insensitive - but only in CQP syntax, not Simple Query. IE it is the consequence of using a case insensitive  mode query (diacritic sensitivity defaults to "off" without you doing anything in Simple Query). The presence of %cd alongside the Simple query "ora[i,]n" in your QH is what suggests the simple query is not being parsed correctly by the system.

Best

Andrew.



From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of Chelsey MacPherson
Sent: 19 July 2024 14:22
To: cwb at sslmit.unibo.it
Subject: [Re: [CWB] CWB Digest, Vol 207, Issue 7

Thank you everyone for your replies!

Query issue: Corpus na Gàidhlig, CQPweb, simple query (ignore case)

I see now that the issue persists when I select "accent insensitive". Then the query "ora[i,]n" displays as this in the query history: ""ora[i,]n" %cd". I don't understand the "%cd" symbol. Is it ignore case? Either way, the accent insensitive selection affects the query and I'll only get results with variants of "orain". As a work around I did this "[òra,òrai,ora,orai,óra,órai]n<https://dasg.arts.gla.ac.uk/CQPweb/dasg/index.php?ui=search&insertString=%5B%C3%B2ra%2C%C3%B2rai%2Cora%2Corai%2C%C3%B3ra%2C%C3%B3rai%2Camhra%2Camhrai%5Dn&insertType=sq_nocase>" and was able to get accented versions or "oran" and "orain".

I don't know enough to verify the preprocessing or indexing question. I see in the corpus metadata that there is no word-level annotation and the STTR is not cached for the tokens-though I don't comprehend those things or know if that answers that question.

Thanks again,
Chelsey
________________________________
From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> <cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>> on behalf of cwb-request at sslmit.unibo.it<mailto:cwb-request at sslmit.unibo.it> <cwb-request at sslmit.unibo.it<mailto:cwb-request at sslmit.unibo.it>>
Sent: Thursday, July 18, 2024 12:39 PM
To: cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it> <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
Subject: CWB Digest, Vol 207, Issue 7

CAUTION: The Sender of this email is not from within Dalhousie.

Send CWB mailing list submissions to
        cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>

To subscribe or unsubscribe via the World Wide Web, visit
        http://liste.sslmit.unibo.it/mailman/listinfo/cwb
or, via email, send a message with subject or body 'help' to
        cwb-request at sslmit.unibo.it<mailto:cwb-request at sslmit.unibo.it>

You can reach the person managing the list at
        cwb-owner at sslmit.unibo.it<mailto:cwb-owner at sslmit.unibo.it>

When replying, please edit your Subject line so it is more specific
than "Re: Contents of CWB digest..."


Today's Topics:

   1. Re: query efficiency issue (graham.ranger)
   2. Re: query efficiency issue (Hardie, Andrew)


----------------------------------------------------------------------

Message: 1
Date: Thu, 18 Jul 2024 17:18:19 +0200
From: "graham.ranger" <graham.ranger at univ-avignon.fr<mailto:graham.ranger at univ-avignon.fr>>
To: Open source development of the Corpus WorkBench
        <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
Subject: Re: [CWB] query efficiency issue
Message-ID: <20240718151748.5460F200EE at zmtaauth05.partage.renater.fr<mailto:20240718151748.5460F200EE at zmtaauth05.partage.renater.fr>>
Content-Type: text/plain; charset="utf-8"

Hello all,?Oddly, and for what it's worth, I created an account, ran the same query, and got the intended answers, i.e. oran and orain.Best,?Graham.Envoy? depuis mon appareil Galaxy
-------- Message d'origine --------De : Stephanie Evert <stefanML at collocations.de<mailto:stefanML at collocations.de>> Date : 17/07/2024  13:44  (GMT+01:00) ? : CWBdev Mailing List <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>> Objet : Re: [CWB] query efficiency issue > I'm having difficulties with a query in Corpus na G?idhlig. When I search "ora[i,]n" it only retrieves "oran" instead of also retrieving "orain". Does anyone have any advice on this? Is this a bug?I suspect we'll only be able to help you if you tell us which Web interface you used to run the query.? I suppose it is some CQPweb installation?Your query      ora[i,]nshould work as a simple query (CEQL syntax) and find both words.? If it doesn't, there might be something wrong with corpus preprocessing or indexing ? or the form simply doesn't exist in the corpus. Do you know it's actually there?You could also try different variants of the query or search for both forms separately. [oran,orain]    oran    orainBest,Stephanie_______________________________________________CWB mailing listCWB at ss
 lmit.unibo.ithttp://liste.sslmit.unibo.it/mailman/listinfo/cwb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20240718/2dc602ad/attachment-0001.html>

------------------------------

Message: 2
Date: Thu, 18 Jul 2024 15:37:23 +0000
From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk>>
To: Open source development of the Corpus WorkBench
        <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
Subject: Re: [CWB] query efficiency issue
Message-ID:
        <LO4P265MB34858A504F7884D46F9DB1D0CBAC2 at LO4P265MB3485.GBRP265.PROD.OUTLOOK.COM<mailto:LO4P265MB34858A504F7884D46F9DB1D0CBAC2 at LO4P265MB3485.GBRP265.PROD.OUTLOOK.COM>>

Content-Type: text/plain; charset="utf-8"

For the record it?s this server: https://dasg.arts.gla.ac.uk/CQPweb/

Andrew

From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> <cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>> On Behalf Of graham.ranger
Sent: Thursday, July 18, 2024 4:18 PM
To: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
Subject: [External] Re: [CWB] query efficiency issue

Hello all,
Oddly, and for what it's worth, I created an account, ran the same query, and got the intended answers, i.e. oran and orain.
Best,
Graham.


Envoy? depuis mon appareil Galaxy


-------- Message d'origine --------
De : Stephanie Evert <stefanML at collocations.de<mailto:stefanML at collocations.de<mailto:stefanML at collocations.de%3cmailto:stefanML at collocations.de>>>
Date : 17/07/2024 13:44 (GMT+01:00)
? : CWBdev Mailing List <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it%3cmailto:cwb at sslmit.unibo.it>>>
Objet : Re: [CWB] query efficiency issue

> I'm having difficulties with a query in Corpus na G?idhlig. When I search "ora[i,]n" it only retrieves "oran" instead of also retrieving "orain". Does anyone have any advice on this? Is this a bug?

I suspect we'll only be able to help you if you tell us which Web interface you used to run the query.  I suppose it is some CQPweb installation?

Your query

ora[i,]n

should work as a simple query (CEQL syntax) and find both words.  If it doesn't, there might be something wrong with corpus preprocessing or indexing ? or the form simply doesn't exist in the corpus. Do you know it's actually there?

You could also try different variants of the query or search for both forms separately.

[oran,orain]
oran
orain

Best,
Stephanie
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20240718/2ad0f5ff/attachment.html>

------------------------------

_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb


End of CWB Digest, Vol 207, Issue 7
***********************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20240927/4b76a7ec/attachment-0001.html>


More information about the CWB mailing list