How to be optimally wrong: understanding Kalman filters
Published:
The Problem
Let’s say you are tracking a drone with a radar. You try to measure the position with the sensor, but the measurement is inaccurate and doesn’t capture its true position. You have another approach, model the motion of the drone, but your model is not perfect, due to environmental factors such as wind. You now have two inperfect approaches, so how do you go about tracking the motion of a drone?
The Kalman filters solves this problem, by blending Bayesian updating with the interplace between you physical model and observations.
A high level overview of a Kalman filter
A Kalman filter is an algorithm that use measurements observed over time to predict the state of some physical object while incorporating model uncertainty and noisy measurements. This is done by performing Bayesian updating while weighting the predicted state from a physically motivated model and a measuremnt. These weights are determined by their uncertainty associated with each one. If the uncertainty from your measurement is huge compared to your predicted state, the Kalman filter will update mainly using the predicted state, not with your measurements.
Kalman filtering was invented by Rudolf E. Kalman in 1960 (cite).
The Kalman filtering algorithm can be summarised by 5 core equations:
- Determine the predicted state at time k \(\begin{equation} \mathbf{x}_{k\vert k-1} = F \cdot \mathbf{x}_{k-1\vert k-1} + B \mathbf{u}_k \end{equation}\)
- Determine the predicted covariance \(\begin{equation} \Sigma_{k|k-1} = F \cdot \Sigma_{k-1|k-1} \cdot F^{-1} \end{equation}\)
- Determine the Kalman gain \(\begin{equation} K_k = \Sigma_{k|k-1}\cdot H^T (H\cdot \Sigma_{k|k-1}) \end{equation}\)
- Update the state estimate \(\begin{equation} \mathbf{x}_{k\vert k} = \mathbf{x}_{k\vert k-1} + K_k (\mathbf{z} - H \mathbf{x}_{k|k-1}) \end{equation}\)
- Update the state error: \(\begin{equation} \Sigma_{k\vert k} = (I-K_k\cdot H)\cdot P_{k\vert k-1} \end{equation}\)
The physical model and predicted state
We tracking an object at a constant velocity. We define the state vector \(\begin{equation} \mathbf{x} = \begin{\pmatrix}x\\ v \end\{pmatrix} \end{equation}\)
where \(x\) is the object’s position and \(v\) is the object’s velocity. From high-school physics, we know that the position after a certain time is \(\begin{equation} x_1 = x_0 + vt \end{equation}\)
\[\begin{equation} \mathbf{x}_{k\vert k-1} = F \cdot \mathbf{x}_{k-1\vert k-1} + B \mathbf{u}_k \end{equation}\] \[\begin{equation} \Sigma_{k|k-1} = F \cdot \Sigma_{k-1|k-1} \cdot F^{-1} \end{equation}\] \[\begin{equation} \mathbf{z}_k = H\cdot \mathbf{x}_k + \sigma_k \end{equation}\]Bayesian updating
Bayes theorem to Bayesian updating
The Kalman filter is an optimal method of Bayesian updating, assuming a Gaussian distribution. The heart of Bayesian inference is Bayes’ theorem
\[\begin{equation} \text{posterior} \propto \text{likelihood} \cdot \text{prior} \end{equation}\]and lets rewrite them in mathematical notion
\[\begin{equation} p(x_k|z_{0:k}) \propto p(z_k \vert x_k) \cdot p(x_k). \end{equation}\]The first assumption we will make is that our liklihood and prior are distributed as a Gaussian. Hence, we can define a two as
\[\begin{equation} p(x_k) = C \exp{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} \end{equation}\]and
\[\begin{equation} p(z|x_k) = C \exp{-\frac{(x-\mu_2)^2}{2\sigma_2^2}}. \end{equation}\]Now, we wish to multiple these together to determine the posterior
\[\begin{equation} p(x_k|z_{0:k}) \propto \exp{-\frac{1}{2}\left(\frac{(x-\mu_1)^2}{\sigma_1^2}+\frac{(x-\mu_2)^2}{\sigma_2^2}\right)} \end{equation}\]and complete the square
\[\begin{equation} \frac{(x-\mu_1)^2}{\sigma_1^2} + \frac{(x-\mu_2)^2}{\sigma^2_2} = \frac{\sigma^2_2(x-\mu_1)^2+\sigma^1_2(x-\mu_2)^2}{\sigma^2_1 \sigma^2_2} \end{equation}\]after expanding the numerator and collecting the terms, we get
\[\begin{equation} =\frac{(\sigma^2_1+\sigma^2_2)(x-\frac{\sigma^2_2\mu_1+\sigma^2_1\mu_2}{\sigma_1^2+\sigma^2_2})}{\sigma_1^2\sigma^2_2} \end{equation}\]Thus, we arrive at our posterior (which is also Gaussian distributed)
\[\begin{equation} p(x_k|z_{0:k}) \propto \exp{-\frac{(x-\mu_{\text{new}})^2}{2\sigma_{\text{new}}^2}} \end{equation}\]with the new mean
\[\begin{equation} \mu_{\text{new}} = \frac{\sigma^2_2 \mu_1+\sigma^2_1\mu_2}{\sigma^2_1 + \sigma^2_2} \end{equation}\]and new covariance
\[\begin{equation} \sigma_{\text{new}} = \frac{\sigma^2_2\sigma^2_1}{\sigma^2_1 + \sigma^2_2} \end{equation}\]Rewriting into Kalman filter form
Let’s rewrite the posterior mean to be
\[\begin{equation} \mu_{\text{new}} = \frac{\sigma^2_2 \mu_1+\sigma^2_1\mu_2}{\sigma^2_1 + \sigma^2_2} = \mu_1 + \frac{\sigma^2_1 (\mu_2 - \mu_1)}{\sigma^2_1 + \sigma^2_2} \end{equation}\]let’s define
\[\begin{equation}K = \frac{\sigma^2_1}{\sigma^2_1 + \sigma^2_2}\end{equation}\]Thus, we arrive at the 4th equation
\[\begin{equation} \mu_{\text{new}} = \mu_1 + K(\mu_2 - \mu_1) \end{equation}\]where it can be rewritten in the more familiar form
\[\begin{equation} \mathbf{x}_{k\vert k} = \mathbf{x}_{k\vert k-1} + K(\mathcal{z}_k - H\cdot \mathbf{x}_{k|k-1}) \end{equation}\]and the 5th equation
\[\begin{equation} \sigma^2_{\text{new}} = (1-K) \sigma^2_1 \end{equation}\]where it can be rewritten in the more familiar form
\[\begin{equation} \Sigma^2_{\text{new}} = (1-K) \Sigma_{k\vert k-1} \end{equation}\]As we can see, the 4th and 5th equations of the Kalman filter is Bayes’ thoerem with a special properties of Gaussians: the product of two Gaussians is a Gaussian. This allow the filter to be recursive, as the posterior will always be the same form as the prior, hence you have a simple recursive loop where you only need a mean and a variance.
The Kalman gain and minimising the mean squared error
When we are using the Kalman filter to predict and update the next time step, we need to use some sort of metric to find the most optimal point. Since we are using Gaussian, we wil use the mean squared error (MSE). As the variance of the Gaussian is the expected squared deviation from the mean, minimising the MSE is equivalent to minimising the spread of your posterior Gaussian, which is the same as shrinking the uncertainty in your predictions.
\[\begin{equation} e_k = x_k - \mathbf{x}_{k\vert k} \end{equation}\]using the 4th equation \(\begin{equation} e_k = x_k - \mathbf{x}_{k\vert k-1} + K(\mathcal{z}_k - H\cdot \mathbf{x}_{k|k-1}) \end{equation}\)
and substituting in \(z_k\)
\[\begin{equation} = x_k - \mathbf{x}_{k\vert k-1} + K(H\cdot \mathbf{x}_k + \sigma_k - H\cdot \mathbf{x}_{k|k-1}) \end{equation}\] \[\begin{equation} = e_{k\vert k-1} - K (H e_{k\vert k-1} - \sigma_k) \end{equation}\] \[\begin{equation} = (I-KH)e_{k\vert k-1} - K\sigma_k \end{equation}\]The full covariance is defined as
\[\begin{equation} \Sigma = E[e_k e^T_k] = E[((I-KH)e_{k\vert k-1} - K\sigma_k)((I-KH)e_{k\vert k-1} - K\sigma_k)^T] \end{equation}\]when we expand this product, since the prediction error \(e_{k\vert k-1}\) and measurement error \(\sigma_k\) are uncorrelated, the cross terms vanish.
\[\begin{equation} = (I-KH)E[e_{k|k-1}e_{k|k-1}^T] + KE[v_kv_k^T]K^T \end{equation}\]The MSE is written as
\[\begin{equation} \text{MSE} = E||x_k - \mathbf{x}_k || = tr(\Sigma_{k\vert k}) \end{equation}\]Let’s minimise it
\[\begin{equation} \frac{d}{dK}tr(|\sigma_{k|k-1}) = 0 \end{equation}\]