Real World Usecases πŸŽ’οƒ

Welcome to the Notebook Playground - where your Jupyter notebooks come to life and show off their filtering magic! ✨ This section showcases real-world applications of TFiltersPy in noisy, messy, dynamic environments where Bayesian filtering shines.

πŸš€ Whether you’re smoothing topic probabilities, estimating hidden states, or tracking uncertainty across time β€” these notebooks will get you started.

πŸ“š For a full list of our example notebooks, head over to our GitHub:

πŸ”— Visit the Examples Directory

Use-Case Templates

Each example notebook typically follows this structure:

  1. Data Loading - Real or simulated data that represents a time-varying system.

  2. Preprocessing - Cleaning, transformation, and feature extraction.

  3. Filter Setup - Define system matrices (F, H), noise covariances (Q, R), and initial conditions.

  4. Fit & Predict - Apply your filter across the dataset using .fit() and .predict() or .run_filter().

  5. Visualization - Plot raw vs filtered estimates.

  6. Interpretation - Gain insights into dynamics, trends, and uncertainty.

Topic Modeling + Kalman Filtering

This notebook shows how to use TFiltersPy to smooth topic probabilities over time in a stream of disaster-related tweets. Smooth chaotic topic trends in disaster-related tweets to track evolving narratives over time.

import pandas as pd
import numpy as np
import dask.array as da
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from  tfilterspy.state_estimation.particle_filters import DaskParticleFilter
import matplotlib.pyplot as plt
  1. Load Disaster Tweets

path_to_disaster_tweets= r'/../../tfilterspy/examples/data/train_nlp.csv'
data_path = path_to_disaster_tweets  # Update after download
df = pd.read_csv(data_path)
tweets = df['text'].values  # ~7613 tweets
print(f"Number of tweets: {len(tweets)}")
  1. Preprocess and Extract Topics

vectorizer = CountVectorizer(max_features=5000, stop_words='english')
X = vectorizer.fit_transform(tweets)
n_topics = 5  # e.g., disaster, weather, casual, news, other
lda = LatentDirichletAllocation(n_components=n_topics, random_state=42)
topic_dist = lda.fit_transform(X)  # Shape: (7613, 5)
X_dask = da.from_array(topic_dist, chunks=(1000, n_topics))
print(f"Topic distribution shape: {X_dask.shape}")
  1. Kalman Filter Initiative

n_features = 14
F = np.eye(n_features)  # Static transition (identity for simplicity)
H = np.eye(n_features)  # Direct observation
Q = np.eye(n_features) * 0.01  # Process noise
R = np.eye(n_features) * 0.1   # Observation noise
x0 = np.zeros(n_features)      # Initial state
P0 = np.eye(n_features)        # Initial covariance
kf = DaskKalmanFilter(F, H, Q, R, x0, P0, estimation_strategy="residual_analysis")
  1. Fit and Predict

kf.fit(X_dask)
smoothed_topics = kf.predict().compute()
  1. Plot Raw vs Smoothed Topics (first 1000 tweets)

plt.figure(figsize=(12, 8))
for i in range(n_topics):
    plt.subplot(n_topics, 1, i + 1)
    plt.plot(topic_dist[:1000, i], label=f"Raw Topic {i+1}", alpha=0.5)
    plt.plot(smoothed_topics[:1000, i], label=f"Smoothed Topic {i+1}", linestyle="--")
    plt.title(f"Topic {i+1}")
    plt.xlabel("Tweet Index (Time)")
    plt.ylabel("Probability")
    plt.legend()
plt.tight_layout()
plt.show()
  1. Interpret Topics

feature_names = vectorizer.get_feature_names_out()
for i, topic in enumerate(lda.components_):
    top_words = [feature_names[j] for j in topic.argsort()[-5:]]
    print(f"Topic {i+1}: {', '.join(top_words)}")