In [1]:

```
import numpy as np
import matplotlib.pyplot as plt
%pylab inline
```

Populating the interactive namespace from numpy and matplotlib

Calculating the probability of some data given a model, usually requires multiplying many different probably terms

$$ P(D\mid M) = \prod_{i=1}^N P(d_i\mid M). $$This quantity easily underflows, which means that you are trying to calculate something smaller than the precision of your computer.

If you had a data set $D$, in which each individual datum had probability of the order of $0.001$ (which is not such small quantity), then the probability of all the data is

$$ P(D) = \left(10^{-3}\right)^{N} = 10^{-3 \times N} $$In [2]:

```
## try it yourself!
10**(-323), 10**(-324)
```

Out[2]:

(1e-323, 0.0)

which means that at $N = 100$ data points,

$$ P(D; N=100) = 10^{-300} $$Has become so small that you would not be able to distinguish having a data set with $N=100$ measurements from having another with $N=200$ points. In other words when we go and try to calculate said probabiliy we are going to end up with 0 instead of what we know it to actually be: a really, really small number.

**How do we get around this?**

Working with the log of the probability can be very useful to avoid underflow problems. Because it expands the dynamic range of the probability effectively giving you access to those small numbers that you couldn't calculate before. Also, because the logarithm of the probability is monotonically increasing, maximizing the probability and the logarithm of the probability is the same task.

But working in log-space can be helpful in many ways, let's go back to our example:

$$
\log P(D\mid M) = \log \prod_{i=1}^N P(d_i\mid M) = \sum_{i=1}^N \log P(d_i\mid M).
$$

If you are having trouble seeing why this is the case remember that it is a property of logarithms that $\log(a_1a_2) = \log(a_1) + \log(a_2)$. Convince yourself that this extends to N elements, $\log(a_1a_2...a_N) = \log(a_1) + \log(a_2) + ...+ \log(a_N)$ when you are sold, it is trivial to see that $\log(a_1a_2...a_N) = \sum_{i=1}^N \log(a_i)$.

If we go back to the previous example:

$$ \log P(D; N=100) = -300 * \log(10) = -690.776 $$but

$$ \log P(D; N=200) = -600 * \log(10) = -1382.551 $$thus, the two cases are easily distinguishable in log space.

$$ \log P(D; N=200) - \log P(D; N=100) = -691.775 $$or

$$ P(D; N=200) = e^{-691.775} P(D; N=100) $$In [3]:

```
np.log(np.exp(1000)+np.exp(999))
```

<ipython-input-1-591ffdc15029>:1: RuntimeWarning: overflow encountered in exp np.log(np.exp(1000)+np.exp(999))

Out[3]:

inf

Sometimes you describe your system with a mixture of probability distributions. For instance like mixture of Gaussian distributions, or like in the case of our homework this week a mixture of exponential distributions.

Now we work in log space so you have, so you end up having to do the following calculation $\log(e^a + e^b)$

If $a$ and $b$ are large and negative (as it is the case when they represent log probabilities), then that calculation cannot be directly. For instance, a naive calculation of $log(e^{-1000} + e^{-999})$ will give you -infinity. And if we tried with positive number, for instance $\log(e^{1000} + e^{999})$ will give you +infinity.

The way to do this calculation robustly is as follows, assume $a > b$, then

$$ \begin{aligned} \log(e^a + e^b) &= \log(\frac{e^a}{e^a}(e^a + e^b)) \\ \log(e^a + e^b) &= \log \left(e^a ( 1 + e^{b-a})\right) \\ \log(e^a + e^b) &= \log(e^a) + \log\left( 1 + e^{b-a}\right)\\ \log(e^a + e^b) &= a + \log\left( 1 + e^{b-a}\right) \end{aligned} $$since $b-a <0$, the exponential $ e^{b-a}\leq1$ never becomes a large number, and the calculation is robust.

Then you will calculate

$$ \log(e^{-1000} + e^{-999}) = -999 + \log(1+e^{-1}) = -999 + 0.31 = - 998.69, $$and

$$ \log(e^{1000} + e^{999}) = 1000 + \log(1+e^{-1}) = 1000 + 0.31 = 1000.31, $$The log-sum-exp trick can be generalized for an arbitrary number of terms $$n$$

$$ \log(e^{a_1} + \ldots + e^{a_n}) = a_{max} + \log\left(1+\sum_{i=1}^ne^{(a_i - a_{max})}\right), $$where $a_{max}$ is the maximum value of the set $\{a_1,\ldots, a_n\}$.

In [4]:

```
from calendar import c
from cmath import log
from scipy.special import logsumexp
#a, b, c
#e^a, e^b, e^c
#e^a + e^b + e^c
#log(e^a + e^b + e^c)
?logsumexp
```

In [5]:

```
import numpy as np
np.logaddexp?
```

We want to find the Area under the curves!

In [6]:

```
import numpy as np
import matplotlib.pyplot as plt
```

In [7]:

```
xs = np.linspace(0,1,int(1e7)) # define bins
dx = xs[1]-xs[0] # width of a bin
```

In [8]:

```
xs #see what our bin list looks like
```

Out[8]:

array([0.0000000e+00, 1.0000001e-07, 2.0000002e-07, ..., 9.9999980e-01, 9.9999990e-01, 1.0000000e+00])

In [9]:

```
def g(x): # a function to integrate
return x
```

In [10]:

```
plt.step(xs,g(xs))
```

Out[10]:

[<matplotlib.lines.Line2D at 0x7fb9884cb820>]

In [11]:

```
%%time
bin_vals = []
for x in xs: # For every value in our list of bins get the area of the particular bin by multiplying the height at that point by the width of the bin.
bin_vals.append(x*dx)
print(np.sum(bin_vals)) # sum up all the areas.
```

0.5000000500000026 CPU times: user 2.26 s, sys: 150 ms, total: 2.41 s Wall time: 2.42 s

In [12]:

```
%%time
np.sum(g(xs)*dx) # numpy broadcasting Vector, we do the operation to ever entry on the array and sum the entries
```

CPU times: user 15 ms, sys: 8.64 ms, total: 23.6 ms Wall time: 23.3 ms

Out[12]:

0.5000000500000026

In [13]:

```
def g(x): #Another function to integrate.
return x**2
```

In [14]:

```
plt.step(xs,g(xs))
```

Out[14]:

[<matplotlib.lines.Line2D at 0x7fb94cdb91c0>]

In [15]:

```
%%time
bin_vals = []
for x in xs: # For every value in our list of bins get the area of the particular bin by multiplying the height at that point by the width of the bin.
bin_vals.append(x**2 * dx)
np.sum(bin_vals) # sum up all the areas.
```

CPU times: user 4.76 s, sys: 69.7 ms, total: 4.83 s Wall time: 4.84 s

Out[15]:

0.3333333833333404

In [16]:

```
%%time
np.sum(xs**2*dx) # numpy broadcasting Vector, we do the operation to ever entry on the array and sum the entries
```

CPU times: user 21.6 ms, sys: 3.69 ms, total: 25.3 ms Wall time: 27.9 ms

Out[16]:

0.3333333833333404

We don't want the area under the curve, we want the volume under the surface!

In [17]:

```
xs = np.linspace(0,1,10) # define bins
ys = np.linspace(0,1,10) # define bins
dx = xs[1]- xs[0] # width of a bin
dy = ys[1]- ys[0] # width of a bin
```

In [18]:

```
%%time
int_value = 0
bin_vals_x = []
for x in xs: # For every value in our list of bins in one axis
bin_vals_y = []
for y in ys: # For every value in our list of bins in the other axis
bin_vals_y.append((x**2-y**2)*dx*dy) #updating lists
int_value += (x**2-y**2)*dx*dy # Summing the entry at every step
bin_vals_x.append(bin_vals_y)
bin_vals_x = np.array(bin_vals_x) #create surface array
print(int_value) #value of the integral
```

-1.5178830414797062e-17 CPU times: user 367 µs, sys: 63 µs, total: 430 µs Wall time: 455 µs

In [ ]:

```
```

In [19]:

```
plt.imshow(bin_vals_x.T, origin='lower', extent=[0,1,0,1])
plt.show()
```

In [20]:

```
xs[:,np.newaxis].shape
```

Out[20]:

(10, 1)

In [21]:

```
%%time
surf = xs[:,np.newaxis]**2 - ys**2 # Numpy broadcasting method of integrating with two axis.
```

CPU times: user 27 µs, sys: 0 ns, total: 27 µs Wall time: 28.8 µs

Try is yourself:

- What does the silcing
`[:, np.newaxis]`

do to the numpy array being sliced? - How does the operation propagate?

In [22]:

```
plt.imshow(surf.T, origin='lower', extent=[0,1,0,1])
plt.show()
```

`log`

computations?¶Try starting here

In [23]:

```
def Z(lam, tmin, tmax):
return lam * (np.exp(-tmin/lam) - np.exp(-tmax/lam))
## TODO FOR YOU, FOR HOMEWORK
## YOU'LL REALLY WANT TO USE LOGS TO COMPUTE THINGS
def logZ(lam, tmin, tmax):
return -1
# check if similar:
np.isclose(Z(3, 0.05, 80), np.exp(logZ(3, 0.05, 80)))
```

Out[23]:

False