MCB111: Mathematics in Biology (Fall 2021)
week 02:
Probability and inference of parameters
How much you trust your experiment?
As a new graduate student, you are picking up a project from a postdoc in the lab who is about to leave lab. This postdoc tells you that her experiments fail at a rate of 1 in 5. Your warmup job is to figure out why, and to see how you can improve that performance.
You start running the experiment with your own hands. After two consecutive success (ss), your experiment fails 2 times in a row
ssff

Do you need to start worrying about not performing the experiment correctly? (Calculate the posterior probability distribution of the probability of a failure f (\(p_f\)) given the data you have, and draw conclusions.)

Just in case, you go back to the postdoc, who fixes some details in your protocol, and now you are confident you are running the experiment correctly. You continue, and in the next set, you obtain
ssfsfsfssssss
What is your current estimate of \(p_f\)? What is the maximum probability value, and mean?

You want to be very certain about your conclusions before you go back to talk to the postdoc about the real failure rate of this experiment. So you decide to continue doing experiments until you have a confidence interval of \(0.1\). Here is a long series of consecutive outcomes for you to use (file homework.dat):
ssffsssssssfsssfssfssffsssfssssssssfffssssffsssssssfsssssssffsfsssssssssssssfsffssssssssfsfsssfffsffsf ffssfsssfsfssfssssffssfsssssffsfsfsfsfssssffffssssssssssfffsfsssssffsfffsfsssfffsssssssffssssfffss
At least how many tests do you need to use to get your estimate to a \(\pm 0.10\) error, and what is the failure rate you are going to report back to the postdoc?
[To answer this question, you would like to calculate the best estimate and confidence intervals for the posterior distribution of \(p_f\), similarly to what we did in class for the exponential problem. Donâ€™t forget to take the log of the distribution before taking derivatives! (a lot easier than to take derivatives from the actual distribution).]

How those result change if you use all the data?
Note: I generated all the data with a particular probability of failure \(p_f\) that will be revealed after the homework has been turned in.