[CWB] Bug in Confidence Interval filter for keyowords using Log Ratio

Hardie, Andrew a.hardie at lancaster.ac.uk
Mon Jul 4 01:57:49 CEST 2016


Yes, good catch. I don’t think it’s necessarily even a copy-paste error – it’s more likely my brain just hiccupped in the process of typing out all the 1s and 2s….

Anyway, fixed in 3.2.21

best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Mildenberger Thoralf (mild)
Sent: 03 July 2016 14:23
To: cwb at sslmit.unibo.it
Subject: [CWB] Bug in Confidence Interval filter for keyowords using Log Ratio


Dear developers,



I tired to reproduce the confidence intervals for the Log Ratio for the keywords function in

CQPweb 3.1.16 myself (in R) and failed. Subsequently, when trying to figure out the formula that is used from the source code (3.2.11), I found the following line in the file "lib/keywords.inc.php":



$fragment_CIhalf = "($Z_unit * SQRT( ($O12 / ($R1 * IF($O11 > 0, $O11, 0.5))) + ($O22 / ($R2 * IF($O11 > 0, $O11, 0.5))) ))";



shouldn't that be:



$fragment_CIhalf = "($Z_unit * SQRT( ($O12 / ($R1 * IF($O11 > 0, $O11, 0.5))) + ($O22 / ($R2 * IF($O21 > 0, $O21, 0.5))) ))";



I guess the second IF($O11 > 0, $O11, 0.5) is a copy-paste error and should really be IF($O21 > 0, $O21, 0.5) as the proportion of "word" in the second corpus should be used. The source code is version 3.2.11, but the code is consistent with the results displayed by 3.1.16.



Best,

Thoralf


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160703/c64b3513/attachment.html>


More information about the CWB mailing list