MCB 111 week 4 Section

p-values and p-hacking

https://xkcd.com/882/

image.png

Examples of p-Hacking:

  1. Stop collecting data once $p \lt 0.05$
  2. Analyze many measures, but report only those with $p \lt 0.05$
  3. Collect and analyze many conditions, but only report those with $p \lt 0.05$
  4. Use covariates to get $p \lt 0.05$
  5. Exclude participants to get $p \lt 0.05$.
  6. Transform the data to get $p \lt 0.05$
  7. Increase the n in one group to get a $p \lt 0.05$.

Hypothetical scenario

Your lab wants to develop a drug to reduce the size of tumors in a particular kind of cancer. In a sort of screening fashion, you lab is exhaustively testing compounds to find out if any of them worked. So using an animal model you measured the size of the tumors from three specimens after a certain time without any drugs and with each candidate drug.

Setting the threshold for significance to 0.05 means that approximately 5% of the statistical tests we do on data gathered from the same distribution will result in false positives.

That means if we did 100 tests we would expect about 5 false positives or 5 percent and if we did 10,000 tests we would expect about 500 false positives in other words the more tests we do the more false positives we have to deal with