[CWB] Concordance returns multiple texts

Hardie, Andrew a.hardie at lancaster.ac.uk
Fri Feb 1 22:01:47 CET 2013

Text elements are just an attribute, they aren't treated specially in any way in the underlying index. So, if two texts were next to one another in the input data, the end of the first and the beginning one are seen as adjacent by CQP, and if you hit a word at the end of one text its concordance will include the beginning of the next.

And re the final question: the answer is yes. Collocation does not take the XML into account at all at present. (Later it might.)



From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Kurt Sultana
Sent: 01 February 2013 20:53
To: Open source development of the Corpus WorkBench
Subject: [CWB] Concordance returns multiple texts

Hi all,

I'm having this scenario:

I have two texts (this is an example, just for illustration):
<text id="1"><s>Hello</s></text>
<text id="2"><s>Bye</s></text>

When I search for concordances of "Hello", I'm also getting "Bye" adjacent. Does this make sense? Aren't text elements supposed to be separate unrelated entities? In my case, I'm representing separate news articles within <text> elements.

Does this also effect collocation?

Thanks in advance,

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20130201/5eb2e14a/attachment.html>

More information about the CWB mailing list