Neural Networks - Learning as Inference
Motivation for the logistic function
The logistic function appears in problems where there is a binary decision to make. Here you will workout a problem (based on MacKays’s exercise 39.5) that like a binary neuron, also uses a logistic function.
The noisy LED display
Figure 1. The noisy LED. Figure extracted from MacKay's Chapter 39.
In a LED display each number corresponds to a pattern of on(1) or off(0) for the 7 different elements that compose the display. For instance, the patterns for numbers 2 and 3 are:
Imagine you have a LED display that is not working properly. This defective LED is such that, for a given number the LED wants to display:
Elements that have to be off, are wrongly on with probability .
Elements that have to be on, are actually on with probability ,
The LED is allowed to display ONLY a number “2” or a number “3”. And it does so by emitting a patter , where
Calculate the posterior probability that the intended number was a “2”, given the pattern you observe in the LED, that is,
Show that you can express that posterior probability as a logistic function,
for some weights , and some constant .
You can assume that the prior probabilities for either number, and , are given.
Hint: for any two real numbers .
The probability that we can calculate is , that is the probability that observing a particular pattern , given that the LED tried to emit a “2”,
Using vector notation,
Then using the hint above, we can rewrite,
Introducing the vector
we can write with all generality
The quantity we have been asked to calculate is not , but instead, given that we have seen a pattern , what is the probability that the pattern was generated with a “2” in mind. That is the posterior probability , which using Bayes theorem is given as a function of as
where is a prior probability.
In the general case in which the LED can produce any of the 10 digits (from 0 to 9), then we have by marginalization
Resulting in the general solution,
Notice that, the normalization condition is .
For our particular problem, where we want to distinguish only between the pattern being generated by a “2” or a “3”, that results in
where here the normalization condition is
The posterior probability can be re-written as
We can define the weights
We can also parameterize the priors as
For instance, for .
Then, we have the expression we wanted to obtain of as a logistic linear function