MCB111: Mathematics in Biology (Fall 2019)
week 08:
Neural Networks  Learning as Inference
Motivation for the logistic function
The logistic function appears in problems where there is a binary decision to make. Here you will workout a problem (based on MacKays’s exercise 39.5) that like a binary neuron, also uses a logistic function.
The noisy LED display
Figure 1. The noisy LED. Figure extracted from MacKay's Chapter 39.
In a LED display each number corresponds to a pattern of on(1) or off(0) for the 7 different elements that compose the display. For instance, the patterns for numbers 2 and 3 are:
Imagine you have a LED display that is not working properly. This defective LED is such that, for a given number the LED wants to display:

Elements that have to be off, are wrongly on with probability .

Elements that have to be on, are actually on with probability ,
The LED is allowed to display ONLY a number “2” or a number “3”. And it does so by emitting a patter , where
Calculate the posterior probability that the intended number was a “2”, given the pattern you observe in the LED, that is,
Show that you can express that posterior probability as a logistic function,
for some weights , and some constant .
You can assume that the prior probabilities for either number, and , are given.
Hint: for any two real numbers .
Solution
The probability that we can calculate is , that is the probability that observing a particular pattern , given that the LED tried to emit a “2”,
Using vector notation,
where
Then using the hint above, we can rewrite,
Introducing the vector
such that
we can write with all generality
The quantity we have been asked to calculate is not , but instead, given that we have seen a pattern , what is the probability that the pattern was generated with a “2” in mind. That is the posterior probability , which using Bayes theorem is given as a function of as
where is a prior probability.
In the general case in which the LED can produce any of the 10 digits (from 0 to 9), then we have by marginalization
Resulting in the general solution,
Notice that, the normalization condition is .
For our particular problem, where we want to distinguish only between the pattern being generated by a “2” or a “3”, that results in
where here the normalization condition is
The posterior probability can be rewritten as
We can define the weights
We can also parameterize the priors as
For instance, for .
Then, we have the expression we wanted to obtain of as a logistic linear function
with weights,
or