[CWB] launching searches in specific segments

Hardie, Andrew a.hardie at lancaster.ac.uk
Thu Jun 16 17:24:54 CEST 2016


>>>>>>
Nonetheless, next to the expected results, we also see text in the other language:
The same happens when we check the context:
However, I understood that by restricting the search to only one language, we should not get segments in the other one.
<<<<<

You aren’t “getting” segments in the other part of the corpus. Your query result, i.e. what you “get”, is the hit: the bit in the middle. Naturally, a view of context around what you found will show parts of other segments if they happen to be nearby in the corpus. But the system has only searched within the segments you asked it to search within.

Since in the case of your data, your French segment is 5 words long, and  a concordance is by default 10 words each way, it’s inevitable that the concordance will include words beyond that segment.

Note that even if you used one of the other methods mentioned using CQP syntax to limit a query to segments in a particular language, you would still see other segments in the concordance, in exactly the same way.

If you want to show only a single <seg> within the concordance that can be done: go to corpus settings and change the concordance context width to 1 of “X element : seg …”

best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Giorgina Cerutti Benitez
Sent: 16 June 2016 14:05
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] launching searches in specific segments

Dear Andrew,

Thank you very much for your answer. I have tried using the restricted queries and this worked, but, unfortunately, when launching our queries we also get other results that, in the best case scenario, should not be retrieved.

For you to get a better idea of what I mean, this would be the structure of the corpus:

<text id="FR_DI_2000_1" organisation="CERD" country="Francia" type="Documento informativo" year="2000" signature="CERD/C/SR.1373">
<s id="1">
<seg lang="fr">
La
séance
est
ouverte
à
10h05
.
</seg>
<seg lang="es">
Se
declara
abierta
la
sesión
a
las
10.05
horas
.
</seg>
</s>

This would be the restrictions table :

And this is what we get when we select French as the search language (the concordancing lines only show texts in French, so that is great).


And this is what we get when we select Spanish as a restriction (we only see results in Spanish in the concordance lines):



Nonetheless, next to the expected results, we also see text in the other language:



The same happens when we check the context:


However, I understood that by restricting the search to only one language, we should not get segments in the other one.

Do you have any thoughts on this? We would greatly appreciate any comments and/or suggestions.

Best,

Giorgina



De : cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> [mailto:cwb-bounces at sslmit.unibo.it] De la part de Hardie, Andrew
Envoyé : mardi 14 juin 2016 13:44
À : Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
Objet : Re: [CWB] launching searches in specific segments

Hi Giorgina,

That query should have worked. One possibility is that you did not declare the XML /  S-attributes correctly when indexing, and the XML tags have been inserted into your index as literal tokens instead of S-attribute ranges.

You can test this by querying

[word="<.*"]

and seeing if you get any results. If you do, XML has been inserted into the main p-attribute. Delete the corpus, and start over!

Also: when you have it working, you might consider using a global constraint  in your query instead of incorporating the S-attribute borders into the mmain body of the query. http://cwb.sourceforge.net/files/CQP_Tutorial/node25.html

i.e.

          a:[] :: a.seg_lang = "fr"

Also also: in CQPweb, you can change the datatype of seg_lang to “Classification”, and then use the Restricted Query  interface to pick the language, whilst just doing queries as normal.

best

Andrew.

From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Giorgina Cerutti Benitez
Sent: 14 June 2016 11:54
To: Open source development of the Corpus WorkBench
Subject: [CWB] launching searches in specific segments

Hello,

We are trying to index a bilingual corpus in CQPweb, and to look for certain words only in a specific language (e.g. only in French). Taking into account the following segment structure, we have tried with this search in CQP syntax:

<seg_lang="fr">[]*</seg_lang>


Structure of the segment:


<s id="1">
<seg lang="fr">
La
séance
est
ouverte
à
10h05
.
</seg>

Nonetheless, I wonder if it is actually possible to determine in which segment or segments we would like to search using CQP syntax and CQPweb, as I have not found information about this in the Administrator’s Manual.

Thank you in advance for all your comments and help.

Best,

Giorgina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160616/2fb3427e/attachment-0001.html>


More information about the CWB mailing list