Interactive visualizations of machine learning concepts running directly in your browser.
Rewards track balanced timesteps per episode.
Training a model to predict housing prices (Target) based on area (Feature).
Dataset: California Housing (Normalized)
Left: The regression line (red) fitting the data points (blue). As loss decreases, the line aligns better with the data trend.
Linear regression is mathematically identical to a single-neuron neural network (with linear activation). Watch the weight (w) and bias (b) update in real-time as gradient descent optimizes them, just like in complex deep networks!
Try a high learning rate (> 0.5) to see divergence. The path will oscillate or fly off the landscape.
Green circle = optimal parameters (global minimum). Red = current position. Yellow = descent path.
The "Juice Un-mixer" Analogy
Imagine you have a smoothie made of apples, kale, and ginger. If you taste it, you just taste "smoothie" (a messy, mixed signal). An SAE is like a magical machine that takes one sip and tells you exactly how many grams of apple, kale, and ginger were used. It "un-mixes" the ingredients into their original, pure forms.
AI models are "greedy." To save space, they often use a single neuron to represent multiple unrelated things (like "Dogs" and "The Eiffel Tower"). This is called Polysemanticity. It makes the model efficient but impossible for humans to read.
An SAE creates a "Learned Dictionary" of thousands of simple templates. By checking the messy AI signal against this dictionary, it finds the few specific "templates" that match the current thought.
Usually, neural networks try to use every neuron a little bit. We force the SAE to use as few "dictionary items" as possible (Sparsity). This pressure forces the AI to find pure, high-level concepts instead of blurry mixtures.
In Demo 1, new dictionary patches start random and gray. As training progresses, the SAE realizes most of them are useless noise. The L1 Regularization (sparsity penalty) forces these useless features to zero (they fade to black). Only the most useful features that explain real patterns (like edges) survive. This is "Feature Selection" in action.
Draw a shape and click "Train." Watch as the "Learned Dictionary" patches evolve from random noise into specific edge detectors that represent your drawing.
These are the templates the AI uses to "read" your drawing.
In Large Language Models, concepts overlap in a confused state called Polysemanticity. One neuron might respond to both "Apple (the fruit)" and "Apple (the company)." By using an SAE, researchers can "separate" these concepts into individual, Monosemantic features.
Densely packed: overlapping signals where one neuron carries multiple meanings.
The SAE "unmixes" the noise: revealing the specific concepts the AI is processing.
| Concept | LLM Reality (The Problem) | SAE Solution (The Fix) |
|---|---|---|
| Superposition | AI packs too many concepts into too few neurons. | Expands concepts into a massive overcomplete layer. |
| Polysemanticity | One neuron handles "Bananas" and "The Space Shuttle." | Each dictionary item isolates a single monosemantic idea. |
| Black Box Loss | AI behavior is inscrutable and "alien" to humans. | Transforms weights into a map of human concepts. |
In 2024, Anthropic used SAEs on their Claude 3 Sonnet model to discover millions of features, including a specific "Golden Gate Bridge" feature. When they manually clamped this feature to "ON", the model became obsessed with the bridge, mentioning it in unrelated conversations. This proved that SAEs don't just find correlation; they find the actual controls of the AI's mind.
How do 5 different features fit into just 2 neurons? The AI learns to arrange them in a star-like shape. The SAE solves a "matching problem" to reconstruct data from this compressed space.
Compare how Serial processing (CPU) differs from Parallel processing (GPU) on matrix tasks.
Designed with a few, very fast, and versatile cores. CPUs are optimized for sequential processing (doing one thing after another very quickly) and handling complex logic/branching.
Designed with thousands of smaller, specialized cores. While individual cores might be slower than a CPU core, their massive parallelism makes them incredibly faster for vector and matrix operations used in ML and Gaming.
TPUs (Tensor Processing Units) and NPUs (Neural Processing Units) take specialization even further. They are essentially stripped-down GPUs designed exclusively for the mathematical operations (tensor arithmetic) used in Deep Learning. By assuming the workload is always neural networks, they remove graphics-specific hardware (like texture mapping and ROPs) to pack even more compute density for AI tasks.