MCB111: Mathematics in Biology (Fall 2023)


week 08:

Neural Networks - Learning as Inference

Motivation for the logistic function

The logistic function appears in problems where there is a binary decision to make. Here you will workout a problem (based on MacKays’s exercise 39.5) that like a binary neuron, also uses a logistic function.

The noisy LED display


Figure 1. The noisy LED. Figure extracted from MacKay's Chapter 39.

In a LED display each number corresponds to a pattern of on(1) or off(0) for the 7 different elements that compose the display. For instance, the patterns for numbers 2 and 3 are:

\[\mathbf{c}(2) = (1,0,1,1,1,0,1)\] \[\mathbf{c}(3) = (1,0,1,1,0,1,1)\]

Imagine you have a LED display that is not working properly. This defective LED is such that, for a given number the LED wants to display:

The LED is allowed to display ONLY a number “2” or a number “3”. And it does so by emitting a patter \(\mathbf{p}=(p1,p2,p3,p4,p5,p6,p7)\), where \(p_i = 1,0\)

Calculate the posterior probability that the intended number was a “2”, given the pattern \(\mathbf{p}\) you observe in the LED, that is,

\[P(n=2\mid \mathbf{p}).\]

Show that you can express that posterior probability as a logistic function,

\[P(n=2\mid \mathbf{p}) = \frac{1}{1+e^{-\mathbf{w}\mathbf{p} + \theta}}\]

for some weights \(\mathbf{w}\), and some constant \(\theta\).

You can assume that the prior probabilities for either number, \(P_2\) and \(P_3\), are given.

Hint: \(x^y = e^{y\log x}\) for any two real numbers \(x, y\).

Solution

The probability that we can calculate is \(P(\mathbf{p}\mid 2)\), that is the probability that observing a particular pattern \(\mathbf{p}\), given that the LED tried to emit a “2”,

\[P(\mathbf{p}\mid 2) = (1-f)^{p_1+p_3+p_4+p_5+p_7}\, f^{p2+p_6}.\]

Using vector notation,

\[P(\mathbf{p}\mid 2) = (1-f)^{\mathbf{c}(2)\mathbf{p}}\, f^{\mathbf{\hat c} (2)\mathbf{p}},\]

where

\[{\hat c}(2)_i = 1- c(2)_i.\]

Then using the hint above, we can rewrite,

\[\begin{aligned} P(\mathbf{p}\mid 2) &= e^{\log(1-f)\mathbf{c}(2)\,\mathbf{p}}\, e^{\log(f)\,\mathbf{\hat c} (2)\mathbf{p}}\\ &= e^{\log(1-f)\,\mathbf{c}(2)\mathbf{p} + \log(f)\,\mathbf{\hat c} (2)\mathbf{p}}\\ &= e^{\left[\log(1-f)\,\mathbf{c}(2) + \log(f)\,\mathbf{\hat c} (2)\right]\mathbf{p}}.\\ \end{aligned}\]

Introducing the vector

\[\mathbf{a}(2) = \log(1-f)\,\mathbf{c}(2) + \log(f)\,\mathbf{\hat c} (2)\]

such that

\[a(2)_i = \log(1-f)\,c(2)_i + \log(f)\,\left(1-c(2)_i\right),\]

we can write with all generality

\[P(\mathbf{p}\mid 2) = e^{\mathbf{a}(2)\,\mathbf{p}}.\]

The quantity we have been asked to calculate is not \(P(\mathbf{p}\mid 2)\), but instead, given that we have seen a pattern \(\mathbf{p}\), what is the probability that the pattern was generated with a “2” in mind. That is the posterior probability \(P(2\mid \mathbf{p})\), which using Bayes theorem is given as a function of \(P(\mathbf{p}\mid 2)\) as

\[P(2\mid \mathbf{p}) = \frac{P(\mathbf{p}\mid 2) P(2)}{P(\mathbf{p})},\]

where \(P(2)\) is a prior probability.

In the general case in which the LED can produce any of the 10 digits (from 0 to 9), then we have by marginalization

\[\begin{aligned} P(\mathbf{p}) &= P(\mathbf{p}\mid 0) P(0) + P(\mathbf{p}\mid 1) P(1) + P(\mathbf{p}\mid 2) P(2) + \ldots + P(\mathbf{p}\mid 9) P(9)\\ &= \sum_{n=0}^{9} e^{\mathbf{a}(n)\,\mathbf{p}}\, P(n). \end{aligned}\]

Resulting in the general solution,

\[P(2\mid \mathbf{p}) = \frac{e^{\mathbf{a}(2)\,\mathbf{p}}\, P(2)}{\sum_{n=0}^{9} e^{\mathbf{a}(n)\,\mathbf{p}}\, P(n)},\]

Notice that, the normalization condition is \(\sum_{n=0}^9 P(n\mid \mathbf{p}) = 1\).

For our particular problem, where we want to distinguish only between the pattern being generated by a “2” or a “3”, that results in

\[P(2\mid \mathbf{p}) = \frac{e^{\mathbf{a}(2)\,\mathbf{p}}\, P(2)}{e^{\mathbf{a}(2)\,\mathbf{p}}\, P(2) + e^{\mathbf{a}(3)\,\mathbf{p}}\, P(3)},\]

where here the normalization condition is

\[P(2\mid \mathbf{p}) + P(3\mid \mathbf{p}) = 1.\]

The posterior probability \(P(2\mid \mathbf{p})\) can be re-written as

\[P(2\mid \mathbf{p}) = \frac{1}{1 + e^{\left[\mathbf{a}(3)-\mathbf{a}(2)\right]\,\mathbf{p}}\, \frac{P(3)}{P(2)}}.\]

We can define the weights

\[\mathbf{w} = \mathbf{a}(3)-\mathbf{a}(2).\]

We can also parameterize the priors as

\[\frac{P(3)}{P(2)}=e^\theta.\]

For instance, \(\theta = -\log(2)\) for \(P(2) = P(3) = 1/2\).

Then, we have the expression we wanted to obtain of \(P(2\mid \mathbf{p})\) as a logistic linear function

\[P(2\mid \mathbf{p}) = \frac{1}{1 + e^{\mathbf{w}\,\mathbf{p} + \theta}},\]

with weights,

\[\begin{aligned} w_i &= a(3)_i - a(2)_i\\ &= \log(1-f)\,\left[c(3)_i - c(2)_i\right] + \log(f)\,\left[c(2)_i - c(3)_i\right]\\ &=\frac{\log(1-f)}{\log f}\left[c(3)_i-c(2)_i\right]\\ \end{aligned}\]

or

\[\mathbf{w} = \frac{\log(1-f)}{\log f}\left(0, 0, 0, 0, -1, 1, 0\right).\]