[CWB] Concordance returns multiple texts

Hardie, Andrew a.hardie at lancaster.ac.uk
Fri Feb 1 22:13:33 CET 2013

Unlikely to unless you have lots of really, really short texts!

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Kurt Sultana
Sent: 01 February 2013 21:12
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] Concordance returns multiple texts

Thanks for you quick reply Andrew.

I guess this "noise" from other texts won't really have much effect on collocation results I presume, right?

Thanks once again,

On Fri, Feb 1, 2013 at 10:01 PM, Hardie, Andrew <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk>> wrote:
Text elements are just an attribute, they aren't treated specially in any way in the underlying index. So, if two texts were next to one another in the input data, the end of the first and the beginning one are seen as adjacent by CQP, and if you hit a word at the end of one text its concordance will include the beginning of the next.

And re the final question: the answer is yes. Collocation does not take the XML into account at all at present. (Later it might.)



From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> [mailto:cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>] On Behalf Of Kurt Sultana
Sent: 01 February 2013 20:53
To: Open source development of the Corpus WorkBench
Subject: [CWB] Concordance returns multiple texts

Hi all,

I'm having this scenario:

I have two texts (this is an example, just for illustration):
<text id="1"><s>Hello</s></text>
<text id="2"><s>Bye</s></text>

When I search for concordances of "Hello", I'm also getting "Bye" adjacent. Does this make sense? Aren't text elements supposed to be separate unrelated entities? In my case, I'm representing separate news articles within <text> elements.

Does this also effect collocation?

Thanks in advance,

CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20130201/e857fc61/attachment-0001.html>

More information about the CWB mailing list