Understanding the Kalman Filter

Think of the Kalman Filter like a smart guesser 🤖.

At each step, it: 1. Predicts what will happen next. 2. Checks what actually happened (the measurement). 3. Adjusts its guess based on how wrong it was.

We do this using a bit of matrix magic — here’s how it works:

Step 1: Predict (What do we think will happen?)

\[ \begin{align}\begin{aligned}\hat{x}_{k|k-1} = F \cdot \hat{x}_{k-1|k-1}\\P_{k|k-1} = F \cdot P_{k-1|k-1} \cdot F^\top + Q\end{aligned}\end{align} \]

\(\hat{x}_{k|k-1}\) is our predicted state at time k.
\(F\) is the state transition matrix.
\(P_{k|k-1}\) is the predicted uncertainty (covariance).
\(Q\) is the process noise — how much we think the system can change randomly.

Step 2: Update (What did we actually see?)

\[ \begin{align}\begin{aligned}y_k = z_k - H \cdot \hat{x}_{k|k-1}\\S_k = H \cdot P_{k|k-1} \cdot H^\top + R\\K_k = P_{k|k-1} \cdot H^\top \cdot S_k^{-1}\end{aligned}\end{align} \]

\(y_k\) is the “innovation” — the difference between what we saw (z_k) and what we predicted.
\(H\) maps our state to what we can observe.
\(S_k\) is the uncertainty in the measurement prediction.
\(R\) is the observation noise — how noisy our measurements are.
\(K_k\) is the Kalman Gain — it decides how much to trust the measurement vs the prediction.

Step 3: Correct (Update our guess based on new info)

\[ \begin{align}\begin{aligned}\hat{x}_{k|k} = \hat{x}_{k|k-1} + K_k \cdot y_k\\P_{k|k} = (I - K_k \cdot H) \cdot P_{k|k-1}\end{aligned}\end{align} \]

We update our estimate of the state and its uncertainty.

Intuition:

If the measurement is very noisy (big \(R\)), we trust our prediction more.
If the prediction is uncertain (big \(P\)), we trust the measurement more.

This beautiful balance between what we expect and what we observe is what makes the Kalman Filter such a powerful tool for filtering out noise and estimating the hidden truth. ✨

Understanding Particle Filters

The Particle Filter is like a swarm of guesses (particles) trying to chase the truth. 🐝 Each particle represents a hypothesis of where the system could be. As time moves on, we adjust how much we trust each guess based on what we observe.

Step 1: Initialization 🐣

Start with N particles, each one initialized at the same known state (or sampled if you want variation):

\[x_i^{(0)} = x_0, \quad w_i^{(0)} = \frac{1}{N}\]

Where: - \(x_i^{(0)}\) is the initial state of the i-th particle - \(w_i^{(0)}\) is its weight (uniform initially)

Step 2: Prediction 🔮

Let each particle evolve through the state transition model and some random noise:

\[x_i^{(k)} = f(x_i^{(k-1)}) + \epsilon_i^{(k)}\]

Where: - \(f(x)\) is the state transition function - \(\epsilon_i^{(k)} \sim \mathcal{N}(0, Q)\) is process noise

Step 3: Measurement Update 🔍

Compare each particle’s prediction to the actual observation:

\[w_i^{(k)} \propto w_i^{(k-1)} \cdot p(y^{(k)} \mid x_i^{(k)})\]

Typically, this likelihood is Gaussian:

\[p(y^{(k)} \mid x_i^{(k)}) = \mathcal{N}(y^{(k)} \mid h(x_i^{(k)}), R)\]

Where: - \(h(x)\) is the observation function - \(R\) is the observation noise covariance

Normalize weights so they sum to 1:

\[\sum_{i=1}^{N} w_i^{(k)} = 1\]

Step 4: Resampling ♻️

If most particles have near-zero weights, we resample to keep only good particles:

Draw N new particles with replacement, favoring high-weight ones.

\[x_i^{(k)} \sim \{ x_j^{(k)} \}_{j=1}^{N}, \quad \text{with probability } w_j^{(k)}\]

Step 5: Estimate State 🎯

The best guess of the state is just a weighted average of all particles:

\[\hat{x}^{(k)} = \sum_{i=1}^{N} w_i^{(k)} x_i^{(k)}\]

Bonus: Residuals

We can define the residual (aka innovation) at each step:

\[r^{(k)} = y^{(k)} - \hat{y}^{(k)}, \quad \text{where } \hat{y}^{(k)} = h(\hat{x}^{(k)})\]

Use these for parameter estimation or diagnostics!

Intuition:

If your model is spot-on, particles stay tight and track the truth.
If your model is wrong or noisy, particles spread out, but the filter still works by focusing on better guesses.

That’s it — just a clever crowd of guesses refining themselves with every new clue! 🧠🎲

Parameter Estimation Methods

When you’re not sure how much noise is in your system (Q and R), these methods help your filter figure it out.

Let’s break down each method simply:

Notation: - \(Q\): Process noise covariance (uncertainty in the system’s evolution). - \(R\): Observation noise covariance (uncertainty in what we observe). - \(y_t\): Observation at time t. - \(\hat{y}_t\): Predicted observation at time t from filter. - \(r_t = y_t - \hat{y}_t\): The residual or innovation.

Residual Analysis 📊

This method says: “Let’s look at the errors and calculate how wild they are.”

We assume the residuals are due to noise. So we use their variance and covariance to estimate Q and R:

\[ \begin{align}\begin{aligned}R \approx \mathrm{Var}(r_t) = \frac{1}{T} \sum_{t=1}^{T} r_t r_t^\top\\Q \approx \mathrm{Cov}(r_t) = \frac{1}{T} \sum_{t=1}^{T} (r_t - \bar{r})(r_t - \bar{r})^\top\end{aligned}\end{align} \]

Where \(\bar{r}\) is the mean of the residuals.

Maximum Likelihood Estimation (MLE) 🔍

MLE says: “Let’s find the Q and R that most likely made our observations happen.”

We do it iteratively: - Run the filter - Get residuals - Update Q and R to maximize the likelihood

Simplified:

\[ \begin{align}\begin{aligned}Q^{(i+1)} = \mathrm{Var}(r_t^{(i)})\\R^{(i+1)} = \mathrm{Var}(r_t^{(i)})\end{aligned}\end{align} \]

Where \(i\) is the iteration index. We stop after a few rounds or when it converges.

Cross-Validation (CV) 🔁

Let’s split the data into parts (folds), train the filter on some, and validate on the rest.

For each fold:

\[ \begin{align}\begin{aligned}\text{Train on } X_{\text{train}}, \quad \text{Validate on } X_{\text{val}}\\Q_{\text{fold}} = \mathrm{Cov}(r_t^{\text{train}}), \quad R_{\text{fold}} = \mathrm{Var}(r_t^{\text{train}})\end{aligned}\end{align} \]

Then we compute the validation score:

\[\text{Score}_{\text{fold}} = \frac{1}{N} \sum_{t \in \text{val}} \left\| y_t - \hat{y}_t \right\|^2\]

We pick the Q and R from the fold with the lowest score.

Adaptive Filtering (Online Updating) 🔄

This method says: “Let’s keep updating Q and R as we go using a small learning rate.”

Every new innovation \(r_t\) gives us new evidence to tweak Q and R:

\[ \begin{align}\begin{aligned}Q_t = (1 - \alpha) Q_{t-1} + \alpha (r_t r_t^\top)\\R_t = (1 - \alpha) R_{t-1} + \alpha \cdot \mathrm{diag}(r_t r_t^\top)\end{aligned}\end{align} \]

Where: - \(\alpha\) is the learning rate (e.g. 0.01) - \(r_t\) is the innovation (residual)

The filter gets smarter over time, adjusting itself like a thermostat reacting to room temperature. 🌡️

These techniques are all about helping the filter “learn” how noisy the world is — so it can be confident when it needs to be, and skeptical when things look fishy. 🐠