Understanding Brownian Motion: From Random Walks to Diffusion Models¶
A ground-up introduction to the stochastic process that powers modern diffusion models
Brownian motion is the mathematical heartbeat of diffusion models. It's the reason why adding noise "just works," why the \(\sqrt{dt}\) scaling appears everywhere, and why continuous-time SDEs can describe discrete-time algorithms like DDPM.
This tutorial builds intuition from physical observations, connects to rigorous mathematics, and shows exactly how Brownian motion enables generative modeling.
Table of Contents¶
- The Physical Origin Story
- Random Walk: The Discrete Cousin
- Brownian Motion: The Continuous Limit
- The Four Defining Properties (And What They Really Mean)
- The Mysterious \(\sqrt{dt}\) Scaling
- From Random Walk to Brownian Motion: The Scaling Limit
- Why This Matters for Diffusion Models
1. The Physical Origin Story¶
The Discovery (1827)¶
Robert Brown, a botanist, looked through a microscope at pollen grains suspended in water and noticed something uncanny:
They jittered forever.
No trend. No settling. No rest. The motion was real, persistent, and structureless.
At first, he suspected life. Then he tried dust particles. Same thing. The motion was universal, relentless, and seemingly random.
The Explanation (1905)¶
Decades later, Albert Einstein explained it: invisible molecular collisions from the surrounding fluid.
The key insights: - Water molecules are in constant thermal motion - They bombard the visible particle from all directions - Each collision is tiny, but there are billions per second - The cumulative effect is visible, persistent random motion
This wasn't just a curiosity—Einstein used it to prove atoms exist and to measure Avogadro's number.
The Mathematical Skeleton (1920s)¶
Norbert Wiener gave the phenomenon its clean mathematical structure, now called the Wiener process or Brownian motion.
This mathematical object became the foundation for: - Stochastic calculus (Itô, Stratonovich) - Financial mathematics (Black-Scholes) - Statistical physics (Langevin dynamics) - Modern machine learning (diffusion models, score matching)
2. Random Walk: The Discrete Cousin¶
Before understanding Brownian motion, let's start with something simpler: random walks.
Definition¶
A random walk is defined step by step in discrete time.
Time: \(t = 0, 1, 2, \ldots\)
At each step, the position changes by a random increment:
where \(\varepsilon_n\) are independent and identically distributed (i.i.d.) random variables.
Classic Example: Simple Symmetric Random Walk¶
Interpretation:
- Flip a coin
- Heads → step right
- Tails → step left
- Repeat forever
Key Properties¶
- Discrete time and usually discrete space
- Independent increments: Each step doesn't depend on previous steps
- Paths look jagged, stair-like
- Easy to simulate, easy to analyze
Where Random Walks Appear¶
- Gambling (gambler's ruin)
- Population genetics (genetic drift)
- Markov chains
- Graph theory (random walks on graphs)
- Toy models for diffusion
3. Brownian Motion: The Continuous Limit¶
Brownian motion is what you get when a random walk is pushed to the limit of infinite refinement.
The Transition¶
Think of it this way:
Random walk is flipping coins while walking on tiles.
Brownian motion is what the walk looks like after you: - Shrink the tiles to dust - Speed up time - Step infinitely often
The path never smooths out. Reality just stops showing you the individual steps.
Continuous Time¶
Brownian motion is defined in continuous time:
Unlike random walks where you count steps \(n = 0, 1, 2, \ldots\), Brownian motion evolves continuously. You can ask "where is the particle at time \(t = \pi\)?" and get a meaningful answer.
4. The Four Defining Properties¶
A process \(B_t\) is Brownian motion (or a Wiener process) if it satisfies four properties. Let's unpack what each one really means.
Conceptual Clarification: What is \(B_t\)?¶
Before diving into the properties, let's clarify what we mean by "process" and how to think about \(B_t\).
Is \(B_t\) a Random Variable?¶
Yes, but more precisely:
- For each fixed time \(t\): \(B_t\) is a random variable representing the position/displacement at time \(t\)
- For the collection \(\{B_t : t \geq 0\}\): This is a stochastic process—a family of random variables indexed by time
Think of it this way:
- At time \(t = 0.5\), \(B_{0.5}\) is a random variable (could be any real number, with probabilities given by a Gaussian)
- At time \(t = 1.0\), \(B_{1.0}\) is a different random variable
- The process \(B_t\) is the entire collection: \(\{B_{0.5}, B_{1.0}, B_{2.3}, \ldots\}\)
In diffusion models: \(B_t\) represents the noise/displacement at time \(t\). For a d-dimensional process, \(B_t \in \mathbb{R}^d\) is a random vector.
Is "Process" the Same as in Gaussian Process, Dirichlet Process, etc.?¶
Yes! The word "process" here means stochastic process in the same sense as:
- Gaussian Process: A collection of random variables \(\{f(x) : x \in \mathcal{X}\}\) where any finite subset is jointly Gaussian
- Dirichlet Process: A stochastic process whose realizations are probability distributions
- Markov Process: A stochastic process with the Markov property
- Brownian Motion: A stochastic process with specific properties (the four we're about to define)
Common structure: All are collections of random variables indexed by some parameter (time, space, etc.)
Key difference: What properties they satisfy: - Brownian motion: Independent increments, Gaussian, variance = time - Gaussian process: Any finite subset is jointly Gaussian - Dirichlet process: Realizations are probability distributions
In summary:
- \(B_t\) (for fixed \(t\)) = random variable (position at time \(t\))
- \(\{B_t : t \geq 0\}\) = stochastic process (the entire trajectory)
- "Process" = same mathematical concept as GP, DP, etc., but with different properties
Property 1: \(B_0 = 0\)¶
"Start at zero."
Mathematical meaning: The process begins at the origin.
Physical meaning: We measure displacement, not absolute position. The particle's initial location is irrelevant; only changes matter.
Interpretation: Wherever the particle starts, call that position zero. This is a choice of reference frame, not a physical constraint.
Why it matters: Origins are conventions, not truths. This mirrors how physics works—we care about relative motion.
Property 2: Independent Increments¶
Mathematical meaning: Changes over non-overlapping time intervals are independent random variables.
Physical meaning: The future does not remember the past. Molecular collisions are so rapid and chaotic that past kicks don't influence future kicks.
Why this is reasonable:
- Billions of collisions per second
- Correlations wash out almost instantly
- The surrounding fluid has no memory at the scale of observation
Interpretation: Yesterday's shove tells you nothing about tomorrow's shove.
This is the Markov assumption, but earned, not assumed. It's why Brownian motion became the backbone of Markov processes, diffusion, and SDEs.
Property 3: Stationary Gaussian Increments¶
This is the most information-dense property. Let's break it down.
Mean = 0¶
No drift. No preferred direction.
Physical meaning: The fluid is isotropic and at equilibrium. If there were a mean, you'd have wind, flow, or external force.
Variance = \(t-s\)¶
Spread grows linearly with time.
Physical meaning:
- Each collision adds a tiny displacement
- Variances add, not displacements
- This is diffusion, not ballistic motion
Key insight: Distance \(\propto \sqrt{\text{time}}\), not \(\propto \text{time}\).
This single fact distinguishes diffusion from motion with velocity.
Gaussian Shape¶
Why normal distribution?
Because the displacement over time is the sum of enormously many tiny, independent kicks, and the Central Limit Theorem kicks in mercilessly.
Gaussianity is not sacred—it's statistical inevitability under these conditions.
Property 4: Continuous Paths (But Nowhere Differentiable)¶
Mathematical meaning: \(B_t\) is continuous, but has no well-defined velocity anywhere.
Physical meaning:
- The particle never teleports
- But at every scale, motion is jittery
The strange reality:
Zoom in: Still jagged.
Zoom in again: Still jagged.
Zoom in infinitely: Still jagged.
There is no smallest wiggle.
Consequences: This forced mathematics to abandon classical calculus and invent: - Itô calculus - Stratonovich calculus - Stochastic integration
Nature demanded new math.
5. The Mysterious \(\sqrt{dt}\) Scaling¶
One of the most confusing aspects of Brownian motion is why noise scales as \(\sqrt{dt}\), not \(dt\). Let's build intuition step by step.
The Question¶
In SDEs, we write:
Why the square root? Why not just \(dt\) or some other power?
Intuition 1: Diffusive Scaling¶
Imagine a particle getting random kicks over time.
- Over 1 second, it wanders a lot
- Over 0.01 seconds, it wanders a little
- Over 0 seconds, it doesn't move at all
Key empirical fact: If you wait 4× longer, the typical distance only doubles—not quadruples.
This is called diffusive scaling.
Intuition 2: Variance Adds, Not Displacements¶
Consider a simple random walk with \(n\) steps:
Variance:
Why this formula? The variance of a sum of independent random variables equals the sum of their variances:
General formula for variance of sum:
For independent random variables \(X_1, X_2, \ldots, X_n\):
$$
\text{Var}\left(\sum_{k=1}^n X_k\right) = \sum_{k=1}^n \text{Var}(X_k) $$
For dependent random variables (with covariance):
$$
\text{Var}\left(\sum_{k=1}^n X_k\right) = \sum_{k=1}^n \text{Var}(X_k) + 2\sum_{i < j} \text{Cov}(X_i, X_j) $$
In our case, the \(\varepsilon_k\) are independent (each step is independent), so the covariance terms are zero, and we get the simple sum.
Standard deviation:
So: - Variance grows linearly with number of steps - Standard deviation (typical distance) grows like \(\sqrt{n}\)
From Steps to Time¶
Now connect steps to continuous time:
- Total time: \(t\)
- Step size: \(\Delta t\)
- Number of steps: \(n = t / \Delta t\)
If each step had noise of fixed size, variance would blow up as \(\Delta t \to 0\). That's bad.
Solution: Scale down the noise per step:
Variance over time \(t\):
Perfect! We get: - Finite variance - Linear growth in time - Well-defined continuous limit
This is why noise per step must scale as \(\sqrt{\Delta t}\).
The Mathematical Statement¶
Brownian motion is defined to satisfy:
Since any \(\mathcal{N}(0, \Delta t)\) variable can be written as \(\sqrt{\Delta t}\) times a standard normal:
This is not an approximation or modeling choice—it's baked into the definition.
Why Not Other Scalings?¶
Let's check alternatives:
If noise scaled as \(dt\)¶
Then: - Variance per step \(\sim dt^2\) - Total variance over time \(\sim dt \to 0\)
Result: Noise disappears. You just get an ODE.
If noise scaled as constant¶
Then: - Variance per unit time \(\to \infty\)
Result: Process explodes. No limit exists.
Only \(\sqrt{dt}\) works¶
It's the unique scaling that gives: - Nontrivial randomness - Finite variance - A meaningful continuous-time limit
This is why Brownian motion is so universal.
The Key Insight for SDEs¶
In an SDE:
You can now clearly see: - Deterministic motion scales like \(dt\) - Stochastic motion scales like \(\sqrt{dt}\)
They live on different scales. This difference is crucial—it's why noise doesn't vanish or explode.
6. From Random Walk to Brownian Motion¶
Now let's make the connection rigorous. How exactly does a random walk become Brownian motion?
The Scaling Limit¶
Setup: Take a random walk with: - Step size: \(\Delta x\) - Time step: \(\Delta t\) - Steps: \(\varepsilon_1, \varepsilon_2, \ldots\) with \(\mathbb{E}[\varepsilon_i] = 0\), \(\text{Var}(\varepsilon_i) = 1\)
Discrete random walk:
Rescaling: Define a continuous-time process:
Interpretation:
- Speed up time by factor \(n\) (so \(nt\) steps happen by time \(t\))
- Shrink space by \(\sqrt{n}\)
Variance check:
Perfect!
Donsker's Invariance Principle¶
Theorem (Donsker, 1951): As \(n \to \infty\),
as processes (convergence in distribution in the space of continuous functions).
What this means:
- The rescaled random walk converges to Brownian motion
- Not just at individual times, but as entire random curves
- This is a functional Central Limit Theorem
Interpretation:
- Random walk = microscopic description
- Brownian motion = macroscopic, continuum description
Just like: - Individual molecules → thermodynamics - Coin flips → Gaussian noise - Discrete Markov chains → SDEs
7. Why This Matters for Diffusion Models¶
Now we can connect everything to diffusion models.
The Forward Process¶
In diffusion models, the forward noising process gradually corrupts clean data into noise.
Discrete view (DDPM):
This is a discrete random walk in noise space.
Continuous view (SDE):
This treats the same process as Brownian motion with drift.
Key insight: DDPM's discrete noise exactly matches the \(\sqrt{dt}\) scaling of Brownian motion.
Why Brownian Motion is Perfect for Diffusion¶
- Continuous paths: Data never jumps discontinuously
- Gaussian increments: Noise is well-behaved, tractable
- Independent increments: Markov property enables efficient training
- Stationary increments: Noise schedule is time-translation invariant
- Universal limit: Any reasonable discrete noise schedule converges to it
The SDE Formulation¶
The forward SDE:
Components:
- \(f(x,t)\): Drift (deterministic shrinkage toward origin)
- \(g(t)\): Diffusion coefficient (noise strength)
- \(dw(t) = \sqrt{dt} \cdot \varepsilon\): Brownian motion
Why this works:
- Drift scales as \(dt\) (slow, deterministic)
- Diffusion scales as \(\sqrt{dt}\) (fast, stochastic)
- Together they balance to create smooth corruption
The Reverse Process¶
The reverse SDE:
Key point: The reverse process also uses Brownian motion (with time running backwards).
Why this is remarkable:
- The same stochastic process that corrupts data
- Can be reversed to generate data
- All thanks to Anderson's theorem (1982)
Connection to Score Matching¶
The score function \(\nabla_x \log p_t(x)\) tells you: - Which direction increases probability - How to "undo" the Brownian noise
Training: Learn to predict the score at all noise levels.
Sampling: Use the learned score in the reverse SDE to denoise.
The bridge: Brownian motion is what makes this connection clean and tractable.
Summary: The Big Picture¶
Let's synthesize everything:
What Brownian Motion Is¶
The universal macroscopic description of motion driven by countless microscopic, memoryless, unbiased fluctuations.
Why It Appears Everywhere¶
- Pollen grains in water
- Heat diffusion
- Stock prices (with caveats)
- Noise in neural models
- Forward processes in diffusion models
- SDE limits of random walks
It is not "the truth of motion"—it is the inevitable limit when you refuse to track microscopic detail.
The Key Properties (Recap)¶
- \(B_0 = 0\): Start at origin (choice of reference frame)
- Independent increments: No memory (Markov property)
- Gaussian increments with variance \(\propto\) time: Diffusive scaling
- Continuous but nowhere differentiable: Infinitely rough
The \(\sqrt{dt}\) Scaling (One-Line Takeaway)¶
Noise scales as \(\sqrt{dt}\) because variance must grow linearly in time for a continuous-time random process to exist; this is the only scaling that yields finite, nontrivial stochastic behavior.
For Diffusion Models¶
- DDPM: Discrete random walk in noise space
- SDE formulation: Continuous Brownian motion with drift
- Same process: Different descriptions, unified by Donsker's theorem
- Why it works: Brownian motion provides the mathematical foundation for adding and removing noise in a principled way
Further Reading¶
Primary Papers¶
- Einstein (1905): Investigations on the Theory of the Brownian Movement
-
Original explanation of Brownian motion via molecular collisions
-
Donsker (1951): An invariance principle for certain probability limit theorems
-
Proves random walk → Brownian motion convergence
-
Song et al. (2021): Score-Based Generative Modeling through SDEs
- Applies Brownian motion to diffusion models
Textbooks¶
- Øksendal (2003): Stochastic Differential Equations: An Introduction with Applications
-
Comprehensive SDE theory with Brownian motion foundations
-
Karatzas & Shreve (1991): Brownian Motion and Stochastic Calculus
-
Advanced reference with rigorous proofs
-
Mörters & Peres (2010): Brownian Motion
- Modern treatment with applications
Related Topics¶
- Itô calculus: How to do calculus with Brownian motion
- Langevin dynamics: Brownian motion + gradient flow
- Fokker-Planck equation: PDE describing probability evolution
- Wiener measure: Probability measure on path space
Next Steps¶
Now that you understand Brownian motion:
- See it in action: Run the SDE tutorial notebook to visualize Brownian paths
- Study the forward SDE: Learn how \(f(x,t)\) and \(g(t)\) are chosen
- Understand score matching: See how the score function reverses diffusion
- Implement DDPM: Connect discrete and continuous views
The mathematical machinery is now in place. Time to build diffusion models!