Category Archives: Statistics

Test Your Intuition (15): Which Experiment is More Convincing

Consider the following  two scenarios

(1) An experiment tests the effect of a new medicine on people which have a certain illness. The conclusion of the experiment is that for 5% of the people tested the medication led to improvement while for 95% it had no effect. (The experiment followed all rules: it had a control test it was double blind etc. …)

A statistical analysis concluded that the results are statistically significant, where the required statistical significance level is 1%. This roughly means that the probability p_1  that such an effect happend by chance (under the “null hypothesis”) is less or equal 0.01. (This probability is called the p-value. Suppose that p_1= 0.008.)

(2) An experiment tests the effect of a new medicine on people which have a certain illness. The conclusion of the experiment is that for 30% of the people tested the medication led to improvement while for 70% it had no effect. (The experiment followed all rules: it had a control test it was double blind etc. …)

A statistical analysis concluded that the results are statistically significant, where the required statistical significance level is 1%. (Again, this roughly means that the probability p_2  that such an effect happend by chance (under the “null hypothesis”) is less or equal 0.01. And again suppose that p_2= 0.008.)

Test your intuition: In which of these two scenarios it is more likely that the effect of the medication is real.

You can assume that the experiments are identical in all other terms that may effect your answer. E.g., the theoretical explanation for the effect of the medicine. Note that our assumption p_1=p_2 is likely to imply that the sample size for the first experiment is larger.

Answer to Test Your Intuition (9)

Two experimental results of 10/100 and 15/100 are not equivalent to one experiment with outcomes 3/200.

(Here is a link to the original post.)

One way to see it is to think about 100 experiments. The outcomes under the null hypothesis will be 100 numbers (more or less) uniformly distributed in [0,1]. So the product is extremely tiny.

What we have to compute is the probability that the product of two random numbers uniformly distributed in [0,1] is smaller or equal 0.015. This probability is much larger than 0.015.

Here is a useful approximation (I thank Brendan McKay for reminding me): if we have n independent values in U(0,1)  then the prob of product < X is

X \sum_{i=0}^{n-1} ( (-1)^i (log X)^i/i!.

In this case  0.015 * ( 1 – log(0.015) ) = 0.078

So the outcomes of the two experiments do not show a significant support for the theory.

The theory of hypothesis testing in statistics is quite fascinating, and of course, it became a principal tool in science and  led to major scientific revolutions. One interesting aspect is the similarity between the notion of statistical proof which is important all over science and the notion of interactive proof in computer science. Unlike mathematical proofs, statistical proof are based on following certain protocols and standing alone if you cannot guarantee that the protocol was followed the proof has little value.

Test Your Intuition (9)

Click on the picture if you wish to read about the “Mars effect”  

A) You want to test the theory that people who were born close to noon on July 7 are unusually tall. You choose randomly 100 Norwegian men over 25 years old and discover that the one person born closest to noon of 7/7 is the 15th tallest among them. Then you chose 100 Nigerian women and discover that the woman born closest to noon on July 7 is the 10th tallest. You figure out that without the putative effect being real (in other words, under the null hypothesis)  the chance for such results occuring at random is 1/10 times 3/20 which is 1.5%, and conclude that this lends significant support to your theory. Are you correct?

B) In a certain scientific area, the level of significance required for a statistical test is 5%.  Would it serve the quality of scientific papers in this area to reduce the required significance level to, say, 0.5%, in order to exclude publishing papers which report experiments that were successful by sheer chance?