Taylor Expansions in Diffusion Models¶
Taylor expansions are the quiet workhorse behind diffusion models, SDE discretization, and the Fokker–Planck equation. They're rarely foregrounded, but almost every "natural-looking" formula in diffusion comes from a first- or second-order expansion where higher-order terms are deliberately thrown away.
This document explains why Taylor expansions appear everywhere in diffusion models and how they enable the bridge between continuous theory and discrete algorithms.
1. The Continuous-Discrete Boundary¶
Diffusion models live at an awkward boundary:
- The true theory is continuous in time (SDEs, PDEs)
- The algorithms are discrete (finite steps, finite networks)
Taylor expansions bridge these worlds.
Signs You're Seeing Taylor Expansion¶
Any time you encounter:
- "small time step"
- "as \(dt \to 0\)"
- "ignore higher-order terms"
- "first-order accurate"
you are seeing Taylor expansion at work.
2. Taylor Expansion for Dynamics¶
For a smooth function \(f(t)\), the Taylor expansion is:
The Core Philosophy¶
Over a small enough time step, the future is approximately linear in the present.
Diffusion models lean heavily on this idea, but applied to:
- Random processes (SDEs)
- Probability densities (Fokker–Planck equation)
3. Warm-Up: Deterministic Dynamics¶
Consider an ordinary differential equation (ODE):
By Taylor expansion:
This is Euler's method for numerical integration.
Nothing fancy yet—just Taylor applied to deterministic dynamics.
4. Enter Randomness: Why SDEs Change the Rules¶
Now consider a stochastic differential equation (SDE):
The Key Twist¶
- \(dt\) is "small"
- But \(dw(t)\) is not order \(dt\)
- It is order \(\sqrt{dt}\)
Two Scales¶
We have two different scales:
| Term Type | Scaling |
|---|---|
| Deterministic | \(\sim dt\) |
| Stochastic | \(\sim \sqrt{dt}\) |
This single fact reshapes Taylor expansion for stochastic processes.
5. Taylor Logic Behind Euler–Maruyama¶
When discretizing the SDE, we write:
with:
Hierarchy of Terms¶
Observe the order of magnitude:
| Term | Order |
|---|---|
| \(f\,\Delta t\) | \(\Delta t\) |
| \(g\,\Delta w\) | \(\sqrt{\Delta t}\) |
| \((\Delta w)^2\) | \(\Delta t\) |
| \((\Delta w)^3\) | \(\Delta t^{3/2}\) |
Truncation Rule¶
- Keep terms up to order \(\Delta t\)
- Drop terms like \(\Delta t^{3/2}\)
This truncation is a Taylor expansion, adapted to stochastic scaling.
6. Example: Why \(\sqrt{1-\beta}\) Becomes \(1 - \frac{1}{2}\beta\)¶
In DDPM, the forward step is:
Why not just \(1 - \frac{1}{2}\beta_k\)?
Taylor Expansion of the Square Root¶
To first order in \(\beta_k\):
Higher-order terms are deliberately ignored.
Connection to VP-SDE¶
This is exactly the same approximation that appears when discretizing:
Why Keep the Square Root?¶
DDPM keeps \(\sqrt{1-\beta_k}\) because:
- It matches the first-order Taylor expansion
- It behaves better at finite step sizes
- It exactly preserves variance
This is a numerical stabilization choice, not an accident.
7. Taylor Expansion Behind the Fokker–Planck Equation¶
Now we reach the deeper place where Taylor expansions do heavy conceptual lifting.
Setup¶
Let \(p(x, t)\) be the probability density of \(x(t)\). We want an equation for how \(p\) evolves over time.
Start from:
Taylor Expand the Density¶
Expand the density itself in space:
Substitute SDE Increment¶
From the SDE:
Take Expectations¶
Key facts:
- \(\mathbb{E}[\Delta w] = 0\)
- \(\mathbb{E}[\Delta w \Delta w^\top] = \Delta t\,I\)
All higher-order terms vanish or are \(o(\Delta t)\).
The Result: Fokker–Planck Equation¶
Key insight: This entire PDE is literally a second-order Taylor expansion of the density under random motion.
8. Why Stop at Second Order?¶
You might ask: why stop at second order in the Fokker–Planck derivation?
The Scaling Argument¶
| Order | Physical Meaning | Scaling |
|---|---|---|
| First | Drift (deterministic flow) | \(\Delta t\) |
| Second | Diffusion (spreading) | \(\Delta t\) |
| Third+ | Higher moments | \(\Delta t^{3/2}\) or higher |
Key fact: \((\Delta w)^n \sim (\Delta t)^{n/2}\)
For \(n \geq 3\), these terms vanish in the \(\Delta t \to 0\) limit.
This Is Not Hand-Waving¶
It's a rigorous scaling argument. The Fokker–Planck equation is the exact continuous-time limit.
9. The Hidden Pattern Across Diffusion Models¶
You can now recognize a repeating structure:
| Component | What Taylor Expansion Does |
|---|---|
| DDPM forward step | Linearize SDE over small time |
| Noise schedule | Match first-order decay |
| Reverse SDE | Drop higher-order stochastic terms |
| Score matching | Linearize log-density gradients |
| Fokker–Planck | Second-order expansion of density |
| Probability-flow ODE | Remove stochastic second-order term |
Philosophy: Diffusion models are Taylor expansions with taste—you keep just enough terms to stay correct, stable, and learnable.
10. The Big Picture¶
Diffusion models work because, over infinitesimal time, random dynamics are simple—and Taylor expansions let us exploit that simplicity repeatedly.
Everything "Magical" Comes from Taylor¶
Everything that feels magical in diffusion models:
- Gaussian noise
- Linear drift
- Quadratic variance
- Score as gradient
- Clean discretizations
comes from discarding higher-order terms in a controlled way.
Summary¶
Taylor expansions are the mathematical glue that turns continuous stochastic dynamics into tractable learning rules in diffusion models, governing:
- SDE discretization (Euler–Maruyama)
- Fokker–Planck equation (density evolution)
- Variance-preserving structure (DDPM forward chain)
- Score matching (gradient approximations)
Next Steps¶
To deepen understanding:
- Derive Fokker–Planck line by line: Spell out every expectation, no black boxes
- Verify variance preservation: Show \(\sqrt{1-\beta_k}\) exactly preserves variance
- Study Itô calculus: Understand \((dw)^2 = dt\) rigorously