# Inference of a joint posterior distribution and sampling¶

### This is a derived from ITILA 28.2¶

Datapoints $(x,t)$ are believed to come from a straight line with Gaussian noise $\varepsilon \sim N(0, \sigma^2)$

$$t = w_0 + w_1 x + \varepsilon,$$

Now we want to infer the parameters $w_0, w_1$ and sample from this posterior

Assuming uniform prior for $w_0$, $w_1$ from -10 to 10

• Posterior: \begin{aligned} P(w_0,w_1|D) &= \frac{P(D|w_0,w_1)P(w_0,w_1)}{P(D)}\ \ P(w_0,w_1|D) &\propto P(D|w_0,w_1)P(w_0,w_1)\ \ P(w_0,w_1|D) &\propto  \begin{cases} \prod_i \sqrt{\frac 1 {2 \pi \sigma^2}}e^{\frac {-(t_i-w_0-w_1 x_i)^2} {2 \sigma^2}} & \text{if } w_0, w_1 \in [-2, 5]\\ 0 & \text{otherwise} \end{cases}\\  \ P(D) &= \int{-2}^{5} \int{-2}^{5} \prod_i \sqrt{\frac 1 {2 \pi \sigma^2}}e^{\frac {-(t_i-w_0-w_1 x_i)^2} {2 \sigma^2}} \, dw_0 dw1\ \ \text{If we wanted to pursue Bayesian hypothesis comparison:}\ \text{Assuming } p(H) = constant \ P(H{linear}|D) &\propto \int{-2}^{5} \int{-2}^{5} \prod_i \sqrt{\frac 1 {2 \pi \sigma^2}}e^{\frac {-(t_i-w_0-w_1 x_i)^2} {2 \sigma^2}} \, dw_0 dw1\ P(H{linear}|D) &\propto \int{-2}^{5}\int{-2}^{5} e^ {\sum_i -0.5\ln 2\pi \sigma^2 {+\frac {-(t_i-w_0-w_1 x_i)^2} {2\sigma^2}} }\, dw_0 dw1\ Log(P(H{linear}|D)) &\propto log(\int{-2}^{5}\int{-2}^{5} e^ {\sum_i -0.5\ln 2\pi \sigma^2 {+\frac {-(t_i-w_0-w_1 x_i)^2} {2\sigma^2}} }\, dw_0 dw_1)\ \end{aligned} \ \text{etc}

### Find maximum a posteriori estimate of $(w_0, w_1)$¶

Equivalent to $\arg\max_{(w_0,w_1)} P(w_0,w_1)$

# Two sampling methods:¶

• Mean field marginalization
• MCMC - Metropolis Hastings

### Marginalization¶

Let's marginalize posterior distributions for each parameter.

## Mean field approximation¶

$$P(w_0, w_1) \approx \tilde{P}(w_0)\times \tilde{P}(w_1)$$

This is called a mean field approximation, and while it introduces some bias, it is a common assumption in many fields of statistics to simplify complex models.

# MCMC¶

From Wikipedia: "the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult" https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm

Here in our case, the full posterior distribution is a 2D pdf, which is hard to sample directly. The algorithm goes like follow:

• we start with some initial parameters $\Theta$ and calculating the probability with $\Theta$

• In each iteration:

• modify the previous parameters by a small value
• calculate the new probability with the new parameters
• calculate the ratio $\alpha = \frac {P_{new}} {P_{prev}}$
• Accept the sample with probability $\min(1, \alpha)$

# Issue 1: initial samples are pretty bad¶

