[CWB] Bug in Confidence Interval filter for keyowords using Log Ratio

Mildenberger Thoralf (mild) mild at zhaw.ch
Sun Jul 3 15:23:12 CEST 2016


Dear developers,


I tired to reproduce the confidence intervals for the Log Ratio for the keywords function in

CQPweb 3.1.16 myself (in R) and failed. Subsequently, when trying to figure out the formula that is used from the source code (3.2.11), I found the following line in the file "lib/keywords.inc.php":


$fragment_CIhalf = "($Z_unit * SQRT( ($O12 / ($R1 * IF($O11 > 0, $O11, 0.5))) + ($O22 / ($R2 * IF($O11 > 0, $O11, 0.5))) ))";


shouldn't that be:


$fragment_CIhalf = "($Z_unit * SQRT( ($O12 / ($R1 * IF($O11 > 0, $O11, 0.5))) + ($O22 / ($R2 * IF($O21 > 0, $O21, 0.5))) ))";


I guess the second IF($O11 > 0, $O11, 0.5) is a copy-paste error and should really be IF($O21 > 0, $O21, 0.5) as the proportion of "word" in the second corpus should be used. The source code is version 3.2.11, but the code is consistent with the results displayed by 3.1.16.


Best,

Thoralf

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160703/2663d0fe/attachment.html>


More information about the CWB mailing list