MCB111: Mathematics in Biology (Fall 2024)


week 01:

Introduction to Information Theory

Preliminars

Present all your reasoning, derivations, plots and code as part of the homework. Imagine that you are writing a short paper that anyone in class should to be able to understand. If you are stuck in some point, please describe the issue and how far you got. A jupyter notebook if you are working in Python is not required, but recommended.

Q1

In Sections, we derived the conditions that determine when we should expect to find one single copy of a motif of length \(l\) in a genome of length \(L\) when all four residues A, C, G, T are equally likely. In general genomes have certain biases. The human genome (\(3.1\times 10^{9}\) bases) has an average content of 42% G-C. Some Archaea thermophiles can be really extreme in their base composition, for instance malaria parasite Plasmodium falciparum, for example, has a GC-content of 20% and a genome size of roughly \(10^7\).

Choose between Q2 or Q3

Q2 is a theoretical question, no coding is involved. You have the option of answering Q2 or going straight to Q3. Extra credit if you answer both.

Q2

Consider a system of 3 neurons that are all interconnected. We are going to assume that each neuron \(n_i\) has two states +1 for firing and -1 for not firing.

Your experimental design allows you only to measure two neurons simultaneously (but not three at the time unfortunately), and what you can obtain is the averages of those pair measurements

\[\begin{aligned} J_{12} = < n_1 n_2 >_{\mbox{obs}} \\ J_{13} = < n_1 n_3 >_{\mbox{obs}}\\ J_{23} = < n_2 n_3 >_{\mbox{obs}}\\ \end{aligned}\]

Obviously, these averages are symmetric so we only need to consider the three cases above, as \(< n_1 n_2>_{\mbox{obs}} = < n_2 n_1>_{\mbox{obs}}\), etc..

\begin{equation} P(n_1, n_2, n_3) \end{equation}

that has the maximum entropy if you impose the average correlations \(J_{12}, J_{13},J_{23}\) that you have observed.

Notice that the correlations are defined by

\[\begin{aligned} < n_1 n_2 > = \sum_{n_1=1,-1}\sum_{n_2=1,-1}\sum_{n_3=1,-1} n_1 n_2\ P(n_1 n_2 n_3)\\ < n_1 n_3 > = \sum_{n_1=1,-1}\sum_{n_2=1,-1}\sum_{n_3=1,-1} n_1 n_3\ P(n_1 n_2 n_3)\\ < n_2 n_3 > = \sum_{n_1=1,-1}\sum_{n_2=1,-1}\sum_{n_3=1,-1} n_2 n_3\ P(n_1 n_2 n_3)\\ \end{aligned}\]

If you need some inspiration, you may want to look into this paper Schneidman E, Still S, Berry MJ, Sergev R, Bialek W (2006) Weak pairwise correlations imply strongly correlated network states in a neural population. In this manuscript, Schneidman et al. show that in vertebrate retina, the maximum entropy model that captures just pairwise correlations (like the one we have introduced here) is sufficient to account for the majority of other non-pairwise collective behaviors.

Q3

For this question, we are going back to our experiment with P. falciparium aggregation probability. In Q2 of the previous homework, you were requested to compare the probability distribution for the aggregation probability from the bacteria in your current (low sample \(N=10\)) experiment on some new conditions, with that derived from a well determined set of standard conditions.

You were asked to compare several samples of similar size in a qualitative way as to whether you thought they could be from the same distribution or not.

Now, with the material we learned in this week’s lecture, you can address that question again doing a quantitative comparison of the actual probability distributions. Hopefully, your conclusions from last week will not change (or if they do, explain), but you should be able to give numbers for the corresponding comparisons.