Inference of a joint posterior distribution and sampling

This is a derived from ITILA 28.2

Datapoints $(x,t)$ are believed to come from a straight line with Gaussian noise $\varepsilon \sim N(0, \sigma^2)$

$$ t = w_0 + w_1 x + \varepsilon, $$

Now we want to infer the parameters $w_0, w_1$ and sample from this posterior

Assuming uniform prior for $w_0$, $w_1$ from -10 to 10

Find maximum a posteriori estimate of $(w_0, w_1)$

Equivalent to $\arg\max_{(w_0,w_1)} P(w_0,w_1)$

Plot joint posterior $P(w_0,w_1)$

Two sampling methods:

Marginalization

Let's marginalize posterior distributions for each parameter.

Mean field approximation

$$P(w_0, w_1) \approx \tilde{P}(w_0)\times \tilde{P}(w_1)$$

This is called a mean field approximation, and while it introduces some bias, it is a common assumption in many fields of statistics to simplify complex models.

How does the width of the posterior change as we subsample the simulated data?

MCMC

From Wikipedia: "the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult" https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm

Here in our case, the full posterior distribution is a 2D pdf, which is hard to sample directly. The algorithm goes like follow:

MCMC results

Issue 1: initial samples are pretty bad

$$\rho_{X X}\left(t_1, t_2\right)=\frac{\mathrm{K}_{X X}\left(t_1, t_2\right)}{\sigma_{t_1} \sigma_{t_2}}=\frac{\mathrm{E}\left[\left(X_{t_1}-\mu_{t_1}\right) \overline{\left(X_{t_2}-\mu_{t_2}\right)}\right]}{\sigma_{t_1} \sigma_{t_2}}$$

Issue 2: samples aren't independent