# Sampling from a probability distribution

## Preliminars

Present all your reasoning, derivations, plots and code as part of the homework. Imagine that you are writing a short paper for anyone to be able to understand. A jupyter notebook if you are working in Python is not required, but highly recommended.

## Q1

This is mostly a computational question, you get familiar with some coding, and also on how to sample from any probability distribution.

• a Take a sample (ie a collection of points according to a distribution) from each of the following distributions: uniform, Gaussian and binomial. Draw a plot for each sample/distribution. Repeat for at least two different values of the parameters of the distributions.

• b Take several samples of different different sizes, and make visual observations as to how well your sample resembles the actual distribution as a function of the number of point in the samples.

## Q2

Bacteria often form aggregates of cells. Aggregates seem to have more antibiotic tolerance than not aggregated bacterial cells. It is important to study conditions that could deplete aggregation as a potential tool to fight bacterial infections.

For a collection of bacterial cell (P. arruginosa to be more specific) aggregation is a stochastic process that happens with probability $$p$$. Someone in your lab has measured $$p$$ under one set of default host condition. The data of those measurements are here: data_default.

You come new to the lab, and your job is to test a different set of conditions. These experiments are involved. Each time, you need to set a large number of colonies, wait 1 day, and determine the fraction of those that has aggregated. Then, you need to repeat the experiment.

• Build the histogram for the data_default.

• You are impatient, and after $$N=10$$ rounds, you are curious as to how well the bacteria aggregate under your conditions. Here is your data1.

You want to compare your data to the distribution under the default conditions, but in order to have a fair comparison you want to compare to a sample of $$N$$ from the default distribution.

Can you draw any conclusions by comparing your experimentally determined sample of $$N=10$$ and the sample of the same size from the default distribution?

• After your result, you decide to go on with the experiment, and you continue to $$N=100$$. Here is the data2. What can you tell now if you compare to a sample of the same size from the default distribution?

## Q3 (extra credit)

Use rejection sampling instead of the inverse method to approach Q2.

In rejection sampling, the rejecting distribution is arbitrary. Here, since the random variable is a probability $$p$$, which ranges between $$[0,1]$$ it makes sense to use the Uniform distribution as $$g(x) = 1$$.

To apply rejection sampling, then you only need to setup the constant $$M$$ such that for all $$p\in [0,1]$$, $$p(p) \leq M$$.

Note that in this case $$p(p)$$ takes sometimes values larger than 1.

## Note

Bacterial cells do aggregate and form colonies with complicated spatial organization. The possible effect of bacterial aggregation on antibiotic resistance is an active subject of research. See an example here. The data in this homework is completely made up.