# week 03:

## Probabilities can be very small numbers

Calculating the probability of some data given a model, usually requires multiplying many different probably terms

$P(D\mid M) = \prod_{i=1}^N P(d_i\mid M).$

This quantity easily underflows, which means that you are trying to calculate something smaller than the precision of your computer.

For instance, if you had a data set $$D$$, in which each individual datum had probability of the order of $$0.001$$ (which is not such small quantity), then the probability of all the data is

$P(D) = \left(10^{-3}\right)^{N} = 10^{-3*N}.$

For double-precision number the smaller number that you can distinguish is

$\mbox{smallest number} \sim 10^{-323}$

which means that after $$N\sim 100$$ data points,

$P(N=100) = 10^{-300}$

has become so small that you would not be able to distinguish having a data set with $$N=100$$ measurements from having another with $$N=200$$ points.

### Working in log space

That is why, we always work with the log of the probability

$\log P(D\mid M) =\log\left( \prod_{i=1}^N P(d_i\mid M) \right) = \sum_{i=1}^N \log P(d_i\mid M).$

Then, in our example

$\log P(N=100) = -300 * \log(10) = -690.776$

but

$\log P(N=200) = -600 * \log(10) = -1382.551$

thus, the two cases are easily distinguishable in log space.

$\log P(N=200) - \log P(N=100) = -691.775$

or

$P(N=200) = P(N=100) \, e^{-691.775}$

### The log-sum-exp trick

Sometimes you describe your system with a mixture of probability distributions. For instance like mixture of Gaussian distributions, or like in the case of our homework this week a mixture of exponential distributions.

Now we work in log space so you have, so you end up having to do the following calculation

$\log(e^a + e^b) .$

If $$a$$ and $$b$$ are large and negative (as it is the case when they represent log probabilities), then that calculation cannot be directly.

For instance, a naive calculation of

$log(e^{-1000} + e^{-999})$

will give you -infinity.

And if we tried with positive number, for instance

$log(e^{1000} + e^{999})$

will give you +infinity.

The way to do this calculation robustly is as follows, assume $$a > b$$, then

$\log(e^a + e^b) = \log \left(e^a ( 1 + e^{b-a})\right) = a + \log\left( 1 + e^{b-a}\right)$

since $$b-a <0$$, the exponential $$e^{b-a}<=1$$ never becomes a large number, and the calculation is robust.

Then you will calculate

$log(e^{-1000} + e^{-999}) = -999 + \log(1+e^{-1}) = -999 + 0.31 = - 998.69,$

and

$log(e^{1000} + e^{999}) = 1000 + \log(1+e^{-1}) = 1000 + 0.31 = 1000.31,$

The log-sum-exp trick can be generalized for an arbitrary number of terms $$n$$

$\log(e^{a_1} + \ldots + e^{a_n}) = a_{max} + \log\left(e^{\sum_{i=1}^n (a_i - a_{max})}\right),$

where $$a_{max}$$ is the maximum value of the set $$\{a_1,\ldots, a_n\}$$.

Most languages have an implementation of the log-sum-exp function, such as Matlab, R, and python.