ML Simulations

Interactive visualizations of machine learning concepts running directly in your browser.

CartPole Training

Episode: 0
Reward: 0
Best: 0
Avg(20): 0
σ: 0.25
Steps/s: 0
Runtime: 0.0s
Status: Idle
Weights: [0.00, 0.00, 0.00, 0.00]
⚙️ Configuration

Rewards track balanced timesteps per episode.

Real-Time Weights / Q-Values

Linear Regression via Gradient Descent

Training a model to predict housing prices (Target) based on area (Feature).
Dataset: California Housing (Normalized)

Loss (MSE): 0.00 Epoch: 0

Left: The regression line (red) fitting the data points (blue). As loss decreases, the line aligns better with the data trend.

Linear regression is mathematically identical to a single-neuron neural network (with linear activation). Watch the weight (w) and bias (b) update in real-time as gradient descent optimizes them, just like in complex deep networks!

Try a high learning rate (> 0.5) to see divergence. The path will oscillate or fly off the landscape.

Linear Regression = Single Neuron

Cost Landscape (w, b)

Green circle = optimal parameters (global minimum). Red = current position. Yellow = descent path.

Residuals (Error)

Sparse Autoencoder (SAE)

The "Juice Un-mixer" Analogy

Imagine you have a smoothie made of apples, kale, and ginger. If you taste it, you just taste "smoothie" (a messy, mixed signal). An SAE is like a magical machine that takes one sip and tells you exactly how many grams of apple, kale, and ginger were used. It "un-mixes" the ingredients into their original, pure forms.

1. The Problem: Superposition

AI models are "greedy." To save space, they often use a single neuron to represent multiple unrelated things (like "Dogs" and "The Eiffel Tower"). This is called Polysemanticity. It makes the model efficient but impossible for humans to read.

2. The Solution: Dictionary Learning

An SAE creates a "Learned Dictionary" of thousands of simple templates. By checking the messy AI signal against this dictionary, it finds the few specific "templates" that match the current thought.

3. The Secret: Sparsity

Usually, neural networks try to use every neuron a little bit. We force the SAE to use as few "dictionary items" as possible (Sparsity). This pressure forces the AI to find pure, high-level concepts instead of blurry mixtures.

Why do features "fade out" during training?

In Demo 1, new dictionary patches start random and gray. As training progresses, the SAE realizes most of them are useless noise. The L1 Regularization (sparsity penalty) forces these useless features to zero (they fade to black). Only the most useful features that explain real patterns (like edges) survive. This is "Feature Selection" in action.

Demo 1: Learning a Visual Dictionary

? How to use:
1. Draw a simple pattern (like a cross or a box) on the left grid.
2. Click Start Training.
3. Watch the Dictionary Patches. They will evolve from random gray squares into specific edge/corner detectors that best describe your drawing.

Draw a shape and click "Train." Watch as the "Learned Dictionary" patches evolve from random noise into specific edge detectors that represent your drawing.

These are the templates the AI uses to "read" your drawing.

Reconstruction Error (MSE)
0.000000

Demo 2: Decoding the AI's "Messy" Mind

In Large Language Models, concepts overlap in a confused state called Polysemanticity. One neuron might respond to both "Apple (the fruit)" and "Apple (the company)." By using an SAE, researchers can "separate" these concepts into individual, Monosemantic features.

? Interactive Keyword Search:
Try typing "pizza" or "law".
• The Polysemantic view shows how a real AI's neurons are cluttered and multi-purposed.
• The Monosemantic view shows how the SAE extracts the "pure concept" from that clutter.

Raw LLM Neurons (Polysemantic)

Densely packed: overlapping signals where one neuron carries multiple meanings.

SAE Decoded Features (Monosemantic)

The SAE "unmixes" the noise: revealing the specific concepts the AI is processing.

ConceptLLM Reality (The Problem)SAE Solution (The Fix)
SuperpositionAI packs too many concepts into too few neurons.Expands concepts into a massive overcomplete layer.
PolysemanticityOne neuron handles "Bananas" and "The Space Shuttle."Each dictionary item isolates a single monosemantic idea.
Black Box LossAI behavior is inscrutable and "alien" to humans.Transforms weights into a map of human concepts.
Real-World Research Impact

In 2024, Anthropic used SAEs on their Claude 3 Sonnet model to discover millions of features, including a specific "Golden Gate Bridge" feature. When they manually clamped this feature to "ON", the model became obsessed with the bridge, mentioning it in unrelated conversations. This proved that SAEs don't just find correlation; they find the actual controls of the AI's mind.

Demo 3: The Geometry of Superposition

How do 5 different features fit into just 2 neurons? The AI learns to arrange them in a star-like shape. The SAE solves a "matching problem" to reconstruct data from this compressed space.

Dictionary Controls

How to read this:
  • Orange Lines: The learned "Dictionary Features".
  • Blue Dot: Mouse cursor (Input Data).
  • Dashed Line: The matched feature. If the dot is close enough (passes threshold), the SAE "fires" that feature.

CPU vs GPU Compute Demonstration ? How this works:
This demo uses JavaScript to simulate hardware performance.
WebGPU/Simulation: Visualizations mimic real parallel workloads.
Performance: We calculate 'ops per millisecond' based on real-world IPC (Instructions Per Clock) and core counts of the selected hardware specs.
Accuracy: While running in a browser environment limits raw hardware access, the relative speed difference effectively demonstrates the massive architectural advantage GPUs have for parallel matrix operations compared to sequential CPU processing.

Compare how Serial processing (CPU) differs from Parallel processing (GPU) on matrix tasks.

CPU i9-14900KS

Workers: 24 | Clock: 6.2 GHz
Time: 0.00s Progress: 0%

GPU Blackwell RTX 5090

Workers: 1024 | Clock: 2.5 GHz
Time: 0.00s Progress: 0%

Understanding the Architecture

CPU (Central Processing Unit)

Designed with a few, very fast, and versatile cores. CPUs are optimized for sequential processing (doing one thing after another very quickly) and handling complex logic/branching.

GPU (Graphics Processing Unit)

Designed with thousands of smaller, specialized cores. While individual cores might be slower than a CPU core, their massive parallelism makes them incredibly faster for vector and matrix operations used in ML and Gaming.

GPU Core Types

  • Shading Units: Programmable cores that calculate color, lighting, and visual effects for individual pixels.
  • ROPs (Render Output Units): Handle the final steps like writing pixels to memory, blending, and anti-aliasing.
  • TMUs (Texture Mapping Units): Specialized hardware for applying and filtering textures (images) on 3D surfaces.
  • CUDA Cores (NVIDIA): General-purpose parallel processors used for compute tasks like machine learning and physics simulations.
  • Ray Tracing Cores: Specialized hardware for calculating light ray intersections to generate realistic reflections and lighting.

Beyond GPUs: TPUs and NPUs

TPUs (Tensor Processing Units) and NPUs (Neural Processing Units) take specialization even further. They are essentially stripped-down GPUs designed exclusively for the mathematical operations (tensor arithmetic) used in Deep Learning. By assuming the workload is always neural networks, they remove graphics-specific hardware (like texture mapping and ROPs) to pack even more compute density for AI tasks.