Multivariate Gaussian distribution

by Matt Johnson - Sat 02 July 2016
Tags: #machine learning

In this article, we work out the mathematics of what happens when we consider Gaussian probability distributions in more than one dimension. Let's start. Recall the Gaussian distribution. It's probability density function is given by:

$N(x \, | \, \mu_x, \sigma_x^2) = \frac{1}{\sqrt{2 \pi \sigma_x^2}}e^{-\frac{1}{2}\frac{(x-\mu_x)^2}{\sigma_x^2}}$

$\int_{-\infty}^{\infty} \, N(x \, | \, \mu_x, \sigma_x^2) \, \text{dx} = 1$

The above Gaussian is univariate because it involves one variable, what if we wish to involve more than one variable? Let's see how we might construct this:

$N(x_1 \, | \, \mu_{x_1}, \sigma_{x_1}^2) = \frac{1}{\sqrt{2 \pi \sigma_{x_1}^2}}e^{-\frac{1}{2} \frac{(x_1-\mu_{x_1})^2}{\sigma_{x_1}^2}}$

$N(x_2 \, | \, \mu_{x_2}, \sigma_{x_2}^2) = \frac{1}{\sqrt{2 \pi \sigma_{x_2}^2}}e^{-\frac{1}{2} \frac{(x_2-\mu_{x_2})^2}{\sigma_{x_2}^2}}$

Here, $x_1$ and $x_2$ are different continous random variables. If we assume they are independent:

$N(x_1, x_2 \, | \, \mu_{x_1}, \mu_{x_2}, \sigma_{x_1}^2, \sigma_{x_2}^2) = N(x_1 \, | \, \mu_{x_1}, \sigma_{x_1}^2)N(x_2 \, | \, \mu_{x_2}, \sigma_{x_2}^2)$

Our new distribution remains normalized since:

$\int_{-\infty}^{\infty} N(x_1, x_2 \, | \, \mu_{x_1}, \mu_{x_2}, \sigma_{x_1}^2, \sigma_{x_2}^2) dx_1 dx_2 = \int_{-\infty}^{\infty} N(x_1 \, | \, \mu_{x_1}, \sigma_{x_1}^2) dx_1 \int_{-\infty}^{\infty}N(x_2 \, | \, \mu_{x_2}, \sigma_{x_2}^2) dx_2 = 1 \cdot 1 = 1$

Thus:

$N(x_1, x_2 \, | \, \mu_{x_1}, \mu_{x_2}, \sigma_{x_1}^2, \sigma_{x_2}^2) = \frac{1}{\sqrt{2 \pi \sigma_{x_1}^2}} \frac{1}{\sqrt{2 \pi \sigma_{x_2}^2}} e^{-\frac{1}{2} \frac{(x_1-\mu_{x_1})^2}{\sigma_{x_1}^2} -\frac{1}{2} \frac{(x_2-\mu_{x_2})^2}{\sigma_{x_2}^2}}$

We can clean this up a bit:

$N(x_1, x_2 \, | \, \mu_{x_1}, \mu_{x_2}, \sigma_{x_1}^2, \sigma_{x_2}^2) = \frac{1}{(2 \pi)^{\frac{D}{2}}} \frac{1}{(\sigma_{x_2}^2 \sigma_{x_1}^2)^{\frac{1}{2}}} e^{-\frac{1}{2} \sum_i^{D} \frac{(x_i-\mu_{x_i})^2}{\sigma_{x_i}^2}}$

Here, $D$ is the number of dimensions. In our current case, $D=2$.

Getting Matrices Involved

We can clean this up some more. Consider the covariance matrix of this system:

$ \Sigma = \begin{bmatrix} \sigma_{x_{11}}^2 & \sigma_{x_{12}}^2 \\ \sigma_{x_{21}}^2 & \sigma_{x_{22}}^2 \end{bmatrix} $

Well, by our independence assumption, we have no correlations! Thus:

$ \Sigma = \begin{bmatrix} \sigma_{x_{11}}^2 & 0 \\ 0 & \sigma_{x_{22}}^2 \end{bmatrix} $

Nice! Let's calculate the determinant of this covariance matrix:

$ \big| \, \Sigma \, \big| = \sigma_{x_{11}}^2 \sigma_{x_{22}}^2 - 0 \cdot 0 = \sigma_{x_{11}}^2 \sigma_{x_{22}}^2 $

Using this, our equation now looks like:

$N(x_1, x_2 \, | \, \mu_{x_1}, \mu_{x_2}, \sigma_{x_1}^2, \sigma_{x_2}^2) = \frac{1}{(2 \pi)^{\frac{D}{2}}} \frac{1}{ | \, \Sigma \, | ^{\frac{1}{2}}} e^{-\frac{1}{2} \sum_i^{D} \frac{(x_i-\mu_{x_i})^2}{\sigma_{x_i}^2}}$

Note this looks a bit confusing with the use of two $\Sigma$ symbols. The one far to the right is stil used for summation, while the one on the left represents our covariance matrix.

We can still clean up more. Consider this term:

$\sum_i^{D} \frac{(x_i-\mu_{x_i})^2}{\sigma_{x_i}^2} = \sum_i^{D} (x_i-\mu_{x_i}) \frac{1}{\sigma_{x_i}^2} (x_i-\mu_{x_i})$

We can introduce some column vectors:

$ \vec{\mu} = \begin{bmatrix} \mu_{x_1} \\ \mu_{x_2} \end{bmatrix} $ $ \vec{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} $

Also, note that the inverse of the covariance matrix is

$ \Sigma \Sigma^{-1} = I = \begin{bmatrix} \sigma_{x_{11}}^2 & 0 \\ 0 & \sigma_{x_{22}}^2 \end{bmatrix} \begin{bmatrix} \frac{1}{\sigma_{x_{11}}^2} & 0 \\ 0 & \frac{1}{\sigma_{x_{22}}^2} \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} $

Now, we just need to put all of this stuff together. Consider:

$ \begin{bmatrix} x_1 - \mu_{x_1} ,\; x_2 - \mu_{x_2} \end{bmatrix} \begin{bmatrix} \frac{1}{\sigma_{x_{11}}^2} & 0 \\ 0 & \frac{1}{\sigma_{x_{22}}^2} \end{bmatrix} \begin{bmatrix} x_1 - \mu_{x_1} \\ x_2 - \mu_{x_2} \end{bmatrix} $

This is actually the same as:

$\sum_i^{D} \frac{(x_i-\mu_{x_i})^2}{\sigma_{x_i}^2}$

Thus,

$\sum_i^{D} \frac{(x_i-\mu_{x_i})^2}{\sigma_{x_i}^2} = (\vec{x} - \vec{\mu})^T \, \Sigma^{-1} \, (\vec{x} - \vec{\mu})$

The Final Form

Finally, our equation looks like:

$N(\vec{x} \, | \, \mu, \Sigma) = N(x_1, x_2 \, | \, \mu_{x_1}, \mu_{x_2}, \sigma_{x_1}^2, \sigma_{x_2}^2) = \frac{1}{(2 \pi)^{\frac{D}{2}}} \frac{1}{ | \, \Sigma \, | ^{\frac{1}{2}}} e^{-\frac{1}{2} (\vec{x} - \vec{\mu})^T \, \Sigma^{-1} \, (\vec{x} - \vec{\mu})}$

Clearly, the following equation is much easier to write down and work with:

$N(\vec{x} \, | \, \mu, \Sigma) = \frac{1}{(2 \pi)^{\frac{D}{2}}} \frac{1}{ | \, \Sigma \, | ^{\frac{1}{2}}} e^{-\frac{1}{2} (\vec{x} - \vec{\mu})^T \, \Sigma^{-1} \, (\vec{x} - \vec{\mu})}$

Although we worked out the math for the case $D=2$, this generalizes to higher dimensions as well. Well, there you have it, the Gaussian in higher dimensions. And remember, this form assumes the $x_i$ random variables involved are all independent.

Comments