week 03:

Probabilities can be very small numbers

Calculating the probability of some data given a model, usually requires multiplying many different probably terms

\[P(D\mid M) = \prod_{i=1}^N P(d_i\mid M).\]

This quantity easily underflows, which means that you are trying to calculate something smaller than the precision of your computer.

For instance, if you had a data set \(D\), in which each individual datum had probability of the order of \(0.001\) (which is not such small quantity), then the probability of all the data is

\[P(D) = \left(10^{-3}\right)^{N} = 10^{-3*N}.\]

For double-precision number the smaller number that you can distinguish is

\[\mbox{smallest number} \sim 10^{-323}\]

which means that after \(N\sim 100\) data points,

\[P(N=100) = 10^{-300}\]

has become so small that you would not be able to distinguish having a data set with \(N=100\) measurements from having another with \(N=200\) points.

Working in log space

That is why, we always work with the log of the probability

\[\log P(D\mid M) =\log\left( \prod_{i=1}^N P(d_i\mid M) \right) = \sum_{i=1}^N \log P(d_i\mid M).\]

Then, in our example

\[\log P(N=100) = -300 * \log(10) = -690.776\]

but

\[\log P(N=200) = -600 * \log(10) = -1382.551\]

thus, the two cases are easily distinguishable in log space.

\[\log P(N=200) - \log P(N=100) = -691.775\]

\[P(N=200) = P(N=100) \, e^{-691.775}\]

The log-sum-exp trick

Sometimes you describe your system with a mixture of probability distributions. For instance like mixture of Gaussian distributions, or like in the case of our homework this week a mixture of exponential distributions.

Now we work in log space so you have, so you end up having to do the following calculation

\[\log(e^a + e^b) .\]

If \(a\) and \(b\) are large and negative (as it is the case when they represent log probabilities), then that calculation cannot be directly.

For instance, a naive calculation of

\[log(e^{-1000} + e^{-999})\]

will give you -infinity.

And if we tried with positive number, for instance

\[log(e^{1000} + e^{999})\]

will give you +infinity.

The way to do this calculation robustly is as follows, assume \(a > b\), then

\[\log(e^a + e^b) = \log \left(e^a ( 1 + e^{b-a})\right) = a + \log\left( 1 + e^{b-a}\right)\]

since \(b-a <0\), the exponential \(e^{b-a}<=1\) never becomes a large number, and the calculation is robust.

Then you will calculate

\[log(e^{-1000} + e^{-999}) = -999 + \log(1+e^{-1}) = -999 + 0.31 = - 998.69,\]

and

\[log(e^{1000} + e^{999}) = 1000 + \log(1+e^{-1}) = 1000 + 0.31 = 1000.31,\]

The log-sum-exp trick can be generalized for an arbitrary number of terms \(n\)

\[\log(e^{a_1} + \ldots + e^{a_n}) = a_{max} + \log\left(e^{\sum_{i=1}^n (a_i - a_{max})}\right),\]

where \(a_{max}\) is the maximum value of the set \(\{a_1,\ldots, a_n\}\).

Most languages have an implementation of the log-sum-exp function, such as Matlab, R, and python.