# Neural Networks - Learning as Inference

## Motivation for the logistic function

The logistic function appears in problems where there is a binary decision to make. Here you will workout a problem (based on MacKays’s exercise 39.5) that like a binary neuron, also uses a logistic function.

### The noisy LED display

Figure 1. The noisy LED. Figure extracted from MacKay's Chapter 39.

In a LED display each number corresponds to a pattern of on(1) or off(0) for the 7 different elements that compose the display. For instance, the patterns for numbers 2 and 3 are:

Imagine you have a LED display that is not working properly. This defective LED is such that, for a given number the LED wants to display:

• Elements that have to be off, are wrongly on with probability $f$.

• Elements that have to be on, are actually on with probability $1-f$,

The LED is allowed to display ONLY a number “2” or a number “3”. And it does so by emitting a patter $\mathbf{p}=(p1,p2,p3,p4,p5,p6,p7)$, where $p_i = 1,0$

Calculate the posterior probability that the intended number was a “2”, given the pattern $\mathbf{p}$ you observe in the LED, that is,

Show that you can express that posterior probability as a logistic function,

for some weights $\mathbf{w}$, and some constant $\theta$.

You can assume that the prior probabilities for either number, $P_2$ and $P_3$, are given.

Hint: $x^y = e^{y\log x}$ for any two real numbers $x, y$.

### Solution

The probability that we can calculate is $P(\mathbf{p}\mid 2)$, that is the probability that observing a particular pattern $\mathbf{p}$, given that the LED tried to emit a “2”,

Using vector notation,

where

Then using the hint above, we can rewrite,

Introducing the vector

such that

we can write with all generality

The quantity we have been asked to calculate is not $P(\mathbf{p}\mid 2)$, but instead, given that we have seen a pattern $\mathbf{p}$, what is the probability that the pattern was generated with a “2” in mind. That is the posterior probability $P(2\mid \mathbf{p})$, which using Bayes theorem is given as a function of $P(\mathbf{p}\mid 2)$ as

where $P(2)$ is a prior probability.

In the general case in which the LED can produce any of the 10 digits (from 0 to 9), then we have by marginalization

Resulting in the general solution,

Notice that, the normalization condition is $\sum_{n=0}^9 P(n\mid \mathbf{p}) = 1$.

For our particular problem, where we want to distinguish only between the pattern being generated by a “2” or a “3”, that results in

where here the normalization condition is

The posterior probability $P(2\mid \mathbf{p})$ can be re-written as

We can define the weights

We can also parameterize the priors as

For instance, $\theta = -\log(2)$ for $P(2) = P(3) = 1/2$.

Then, we have the expression we wanted to obtain of $P(2\mid \mathbf{p})$ as a logistic linear function

with weights,

or