MCB111: Mathematics in Biology (Fall 2024)
week 08:
Neural Networks - Learning as Inference
Motivation for the logistic function
The logistic function appears in problems where there is a binary decision to make. Here you will workout a problem (based on MacKays’s exercise 39.5) that like a binary neuron, also uses a logistic function.
The noisy LED display
Figure 1. The noisy LED. Figure extracted from MacKay's Chapter 39.
In a LED display each number corresponds to a pattern of on(1) or off(0) for the 7 different elements that compose the display. For instance, the patterns for numbers 2 and 3 are:
\[\mathbf{c}(2) = (1,0,1,1,1,0,1)\] \[\mathbf{c}(3) = (1,0,1,1,0,1,1)\]Imagine you have a LED display that is not working properly. This defective LED is such that, for a given number the LED wants to display:
-
Elements that have to be off, are wrongly on with probability \(f\).
-
Elements that have to be on, are actually on with probability \(1-f\),
The LED is allowed to display ONLY a number “2” or a number “3”. And it does so by emitting a patter \(\mathbf{p}=(p1,p2,p3,p4,p5,p6,p7)\), where \(p_i = 1,0\)
Calculate the posterior probability that the intended number was a “2”, given the pattern \(\mathbf{p}\) you observe in the LED, that is,
\[P(n=2\mid \mathbf{p}).\]Show that you can express that posterior probability as a logistic function,
\[P(n=2\mid \mathbf{p}) = \frac{1}{1+e^{-\mathbf{w}\mathbf{p} + \theta}}\]for some weights \(\mathbf{w}\), and some constant \(\theta\).
You can assume that the prior probabilities for either number, \(P_2\) and \(P_3\), are given.
Hint: \(x^y = e^{y\log x}\) for any two real numbers \(x, y\).
Solution
The probability that we can calculate is \(P(\mathbf{p}\mid 2)\), that is the probability that observing a particular pattern \(\mathbf{p}\), given that the LED tried to emit a “2”,
\[P(\mathbf{p}\mid 2) = (1-f)^{p_1+p_3+p_4+p_5+p_7}\, f^{p2+p_6}.\]Using vector notation,
\[P(\mathbf{p}\mid 2) = (1-f)^{\mathbf{c}(2)\mathbf{p}}\, f^{\mathbf{\hat c} (2)\mathbf{p}},\]where
\[{\hat c}(2)_i = 1- c(2)_i.\]Then using the hint above, we can rewrite,
\[\begin{aligned} P(\mathbf{p}\mid 2) &= e^{\log(1-f)\mathbf{c}(2)\,\mathbf{p}}\, e^{\log(f)\,\mathbf{\hat c} (2)\mathbf{p}}\\ &= e^{\log(1-f)\,\mathbf{c}(2)\mathbf{p} + \log(f)\,\mathbf{\hat c} (2)\mathbf{p}}\\ &= e^{\left[\log(1-f)\,\mathbf{c}(2) + \log(f)\,\mathbf{\hat c} (2)\right]\mathbf{p}}.\\ \end{aligned}\]Introducing the vector
\[\mathbf{a}(2) = \log(1-f)\,\mathbf{c}(2) + \log(f)\,\mathbf{\hat c} (2)\]such that
\[a(2)_i = \log(1-f)\,c(2)_i + \log(f)\,\left(1-c(2)_i\right),\]we can write with all generality
\[P(\mathbf{p}\mid 2) = e^{\mathbf{a}(2)\,\mathbf{p}}.\]The quantity we have been asked to calculate is not \(P(\mathbf{p}\mid 2)\), but instead, given that we have seen a pattern \(\mathbf{p}\), what is the probability that the pattern was generated with a “2” in mind. That is the posterior probability \(P(2\mid \mathbf{p})\), which using Bayes theorem is given as a function of \(P(\mathbf{p}\mid 2)\) as
\[P(2\mid \mathbf{p}) = \frac{P(\mathbf{p}\mid 2) P(2)}{P(\mathbf{p})},\]where \(P(2)\) is a prior probability.
In the general case in which the LED can produce any of the 10 digits (from 0 to 9), then we have by marginalization
\[\begin{aligned} P(\mathbf{p}) &= P(\mathbf{p}\mid 0) P(0) + P(\mathbf{p}\mid 1) P(1) + P(\mathbf{p}\mid 2) P(2) + \ldots + P(\mathbf{p}\mid 9) P(9)\\ &= \sum_{n=0}^{9} e^{\mathbf{a}(n)\,\mathbf{p}}\, P(n). \end{aligned}\]Resulting in the general solution,
\[P(2\mid \mathbf{p}) = \frac{e^{\mathbf{a}(2)\,\mathbf{p}}\, P(2)}{\sum_{n=0}^{9} e^{\mathbf{a}(n)\,\mathbf{p}}\, P(n)},\]Notice that, the normalization condition is \(\sum_{n=0}^9 P(n\mid \mathbf{p}) = 1\).
For our particular problem, where we want to distinguish only between the pattern being generated by a “2” or a “3”, that results in
\[P(2\mid \mathbf{p}) = \frac{e^{\mathbf{a}(2)\,\mathbf{p}}\, P(2)}{e^{\mathbf{a}(2)\,\mathbf{p}}\, P(2) + e^{\mathbf{a}(3)\,\mathbf{p}}\, P(3)},\]where here the normalization condition is
\[P(2\mid \mathbf{p}) + P(3\mid \mathbf{p}) = 1.\]The posterior probability \(P(2\mid \mathbf{p})\) can be re-written as
\[P(2\mid \mathbf{p}) = \frac{1}{1 + e^{\left[\mathbf{a}(3)-\mathbf{a}(2)\right]\,\mathbf{p}}\, \frac{P(3)}{P(2)}}.\]We can define the weights
\[\mathbf{w} = \mathbf{a}(3)-\mathbf{a}(2).\]We can also parameterize the priors as
\[\frac{P(3)}{P(2)}=e^\theta.\]For instance, \(\theta = -\log(2)\) for \(P(2) = P(3) = 1/2\).
Then, we have the expression we wanted to obtain of \(P(2\mid \mathbf{p})\) as a logistic linear function
\[P(2\mid \mathbf{p}) = \frac{1}{1 + e^{\mathbf{w}\,\mathbf{p} + \theta}},\]with weights,
\[\begin{aligned} w_i &= a(3)_i - a(2)_i\\ &= \log(1-f)\,\left[c(3)_i - c(2)_i\right] + \log(f)\,\left[c(2)_i - c(3)_i\right]\\ &=\frac{\log(1-f)}{\log f}\left[c(3)_i-c(2)_i\right]\\ \end{aligned}\]or
\[\mathbf{w} = \frac{\log(1-f)}{\log f}\left(0, 0, 0, 0, -1, 1, 0\right).\]