[CWB] launching searches in specific segments

Giorgina Cerutti Benitez Giorgina.Cerutti at unige.ch
Thu Jun 16 15:04:58 CEST 2016


Dear Andrew,

Thank you very much for your answer. I have tried using the restricted queries and this worked, but, unfortunately, when launching our queries we also get other results that, in the best case scenario, should not be retrieved.

For you to get a better idea of what I mean, this would be the structure of the corpus:

<text id="FR_DI_2000_1" organisation="CERD" country="Francia" type="Documento informativo" year="2000" signature="CERD/C/SR.1373">
<s id="1">
<seg lang="fr">
La
séance
est
ouverte
à
10h05
.
</seg>
<seg lang="es">
Se
declara
abierta
la
sesión
a
las
10.05
horas
.
</seg>
</s>

This would be the restrictions table :[cid:image001.jpg at 01D1C7E0.7224EF80]

And this is what we get when we select French as the search language (the concordancing lines only show texts in French, so that is great).
[cid:image003.jpg at 01D1C7E0.7224EF80]

And this is what we get when we select Spanish as a restriction (we only see results in Spanish in the concordance lines):
[cid:image005.jpg at 01D1C7E0.7224EF80]


Nonetheless, next to the expected results, we also see text in the other language:
[cid:image012.png at 01D1C7E0.71FC58E0]


The same happens when we check the context:
[cid:image007.jpg at 01D1C7E0.7224EF80]

However, I understood that by restricting the search to only one language, we should not get segments in the other one.

Do you have any thoughts on this? We would greatly appreciate any comments and/or suggestions.

Best,

Giorgina



De : cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] De la part de Hardie, Andrew
Envoyé : mardi 14 juin 2016 13:44
À : Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it>
Objet : Re: [CWB] launching searches in specific segments

Hi Giorgina,

That query should have worked. One possibility is that you did not declare the XML /  S-attributes correctly when indexing, and the XML tags have been inserted into your index as literal tokens instead of S-attribute ranges.

You can test this by querying

[word="<.*"]

and seeing if you get any results. If you do, XML has been inserted into the main p-attribute. Delete the corpus, and start over!

Also: when you have it working, you might consider using a global constraint  in your query instead of incorporating the S-attribute borders into the mmain body of the query. http://cwb.sourceforge.net/files/CQP_Tutorial/node25.html

i.e.

          a:[] :: a.seg_lang = "fr"

Also also: in CQPweb, you can change the datatype of seg_lang to “Classification”, and then use the Restricted Query  interface to pick the language, whilst just doing queries as normal.

best

Andrew.

From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Giorgina Cerutti Benitez
Sent: 14 June 2016 11:54
To: Open source development of the Corpus WorkBench
Subject: [CWB] launching searches in specific segments

Hello,

We are trying to index a bilingual corpus in CQPweb, and to look for certain words only in a specific language (e.g. only in French). Taking into account the following segment structure, we have tried with this search in CQP syntax:

<seg_lang="fr">[]*</seg_lang>


Structure of the segment:


<s id="1">
<seg lang="fr">
La
séance
est
ouverte
à
10h05
.
</seg>

Nonetheless, I wonder if it is actually possible to determine in which segment or segments we would like to search using CQP syntax and CQPweb, as I have not found information about this in the Administrator’s Manual.

Thank you in advance for all your comments and help.

Best,

Giorgina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160616/57a6c48c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image012.png
Type: image/png
Size: 121519 bytes
Desc: image012.png
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160616/57a6c48c/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 43557 bytes
Desc: image001.jpg
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160616/57a6c48c/attachment-0004.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.jpg
Type: image/jpeg
Size: 25993 bytes
Desc: image003.jpg
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160616/57a6c48c/attachment-0005.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.jpg
Type: image/jpeg
Size: 25002 bytes
Desc: image005.jpg
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160616/57a6c48c/attachment-0006.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image007.jpg
Type: image/jpeg
Size: 22777 bytes
Desc: image007.jpg
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160616/57a6c48c/attachment-0007.jpg>


More information about the CWB mailing list