[CWB] Bug in Confidence Interval filter for keyowords using Log Ratio
Mildenberger Thoralf (mild)
mild at zhaw.ch
Sun Jul 3 15:23:12 CEST 2016
Dear developers,
I tired to reproduce the confidence intervals for the Log Ratio for the keywords function in
CQPweb 3.1.16 myself (in R) and failed. Subsequently, when trying to figure out the formula that is used from the source code (3.2.11), I found the following line in the file "lib/keywords.inc.php":
$fragment_CIhalf = "($Z_unit * SQRT( ($O12 / ($R1 * IF($O11 > 0, $O11, 0.5))) + ($O22 / ($R2 * IF($O11 > 0, $O11, 0.5))) ))";
shouldn't that be:
$fragment_CIhalf = "($Z_unit * SQRT( ($O12 / ($R1 * IF($O11 > 0, $O11, 0.5))) + ($O22 / ($R2 * IF($O21 > 0, $O21, 0.5))) ))";
I guess the second IF($O11 > 0, $O11, 0.5) is a copy-paste error and should really be IF($O21 > 0, $O21, 0.5) as the proportion of "word" in the second corpus should be used. The source code is version 3.2.11, but the code is consistent with the results displayed by 3.1.16.
Best,
Thoralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160703/2663d0fe/attachment.html>
More information about the CWB
mailing list