From DDPM to VP-SDE: The Continuous Limit¶
This "identity check" is one of the most satisfying derivations in diffusion theory. We'll start from the DDPM forward discrete step, take the small-step limit, and recover the VP-SDE:
We'll derive this by matching conditional mean and conditional variance of increments—the cleanest way to pass from discrete Markov chains to continuous SDEs.
Overview¶
This derivation shows that DDPM is not an arbitrary discrete algorithm—it's a discretization of a continuous stochastic process. Understanding this connection:
- Unifies DDPM with the SDE framework
- Explains the variance-preserving structure
- Justifies the \(\sqrt{1-\beta_k}\) coefficient
- Enables continuous-time analysis and samplers
Notation¶
Time Discretization¶
| Symbol | Meaning |
|---|---|
| \(k = 0, 1, \ldots, N\) | Discrete DDPM time index |
| \(t \in [0, T]\) | Continuous time |
| \(t_k = k\,\Delta t\) | Time grid (uniform steps) |
| \(\Delta t = T/N\) | Step size |
DDPM Forward Step¶
The standard DDPM forward step is:
with \(\alpha_k = 1 - \beta_k\). Equivalently:
where:
- \(\beta_k\) is the discrete "noise amount" at step \(k\)
- In the continuous limit, we'll set \(\beta_k \propto \Delta t\)
Step 1: Continuous-Time Scaling¶
To obtain an SDE limit, the per-step noise must shrink as the step size shrinks. The standard scaling is:
where \(\beta(t)\) is a smooth nonnegative function called the noise rate per unit time.
Intuition: As we take finer time steps (\(\Delta t \to 0\)), the noise added per step must also shrink proportionally.
Substituting into the DDPM step:
Step 2: Write the Increment¶
Define the increment:
Substituting:
Now we'll Taylor-expand the square root term.
Step 3: Taylor Expand the Square Root¶
For small \(u\), the Taylor expansion of \(\sqrt{1-u}\) is:
With \(u = \beta(t_k)\,\Delta t\):
Substituting back:
Observation: This already looks like an SDE increment!
Step 4: Match Moments (The Key Move)¶
For an Itô SDE:
the increment over \(\Delta t\) satisfies:
Conditional mean:
Conditional covariance (for isotropic noise):
We'll compute these for DDPM and match them to identify \(f\) and \(g\).
Conditional Mean¶
Since \(\mathbb{E}[\varepsilon_k] = 0\):
Divide by \(\Delta t\) and take \(\Delta t \to 0\):
Therefore, the drift is:
Conditional Covariance¶
The only random part is \(\sqrt{\beta(t_k)\,\Delta t}\,\varepsilon_k\). Since \(\varepsilon_k \sim \mathcal{N}(0, I)\):
Comparing with \(g(t)^2\,\Delta t\,I\):
Step 5: Recognize Brownian Scaling¶
We can rewrite the noise term:
But \(\sqrt{\Delta t}\,\varepsilon_k\) is exactly a discretized Brownian increment:
The DDPM Increment¶
Combining everything:
The Continuous Limit¶
In the limit \(\Delta t \to 0\), this converges to the Itô SDE:
This is exactly the variance-preserving SDE (VP-SDE).
What We Proved¶
We can now state precisely:
DDPM's forward Markov chain is a discrete-time process whose small-step continuous-time limit is the VP-SDE, with drift \(-\frac{1}{2}\beta(t) x\) and diffusion \(\sqrt{\beta(t)}\).
Key Insights¶
-
The \(\sqrt{1-\beta_k}\) coefficient is a variance-preserving discretization whose Taylor expansion agrees with the SDE drift to first order
-
DDPM is not arbitrary—it's a principled discretization of a continuous stochastic process
-
The connection is exact—matching moments uniquely determines both drift and diffusion
Why "Variance-Preserving"?¶
The VP-SDE has a special property:
- The linear drift \(-\frac{1}{2}\beta(t) x\) shrinks \(x\) toward zero
- The noise \(\sqrt{\beta(t)}\,dw\) injects variance
- The coefficients are tuned so the overall variance stays controlled
Under typical schedules, the process smoothly approaches a standard Gaussian at \(t = T\), without variance explosion.
Connection to Closed-Form Marginals¶
The VP-SDE has a closed-form solution:
In DDPM notation, \(\bar{\alpha}_t\) corresponds to:
This is the last piece that makes discrete and continuous notations line up perfectly.
Summary¶
We derived the VP-SDE from DDPM through:
- Continuous-time scaling: \(\beta_k = \beta(t_k)\,\Delta t\)
- Taylor expansion: \(\sqrt{1-\beta_k} \approx 1 - \frac{1}{2}\beta_k\)
- Moment matching: Identify drift and diffusion from mean and covariance
- Brownian scaling: Recognize \(\sqrt{\Delta t}\,\varepsilon\) as Brownian increment
The result: DDPM is the Euler–Maruyama discretization of the VP-SDE, with variance-preserving modifications.
Related Documents¶
- Deriving DDPM from VP-SDE (the reverse direction)
- Taylor Expansions in Diffusion
- Fokker–Planck Equation
- SDE View Overview