Where Do \(\alpha(t)\) and \(\bar{\alpha}_t\) Come From?¶
Overview¶
The coefficients \(\alpha(t)\) and \(\bar{\alpha}_t\) appear throughout diffusion model theory, but their definitions with integrals in the exponent can seem mysterious:
Key insight: These are not arbitrary definitions. They emerge naturally from solving the forward SDE using the integrating factor technique.
This document explains where these definitions come from and why they have this specific form.
Referenced From¶
docs/diffusion/noise_schedules.md— Uses these definitions extensivelydocs/diffusion/forward_process_derivation.md— Full derivation of the forward process
The Starting Point: The VP-SDE¶
The Variance-Preserving SDE describes how clean data \(x_0\) is corrupted over time:
where:
- \(x(t)\) is the state at time \(t\) (starts at \(x_0\))
- \(\beta(t) > 0\) is the noise schedule
- \(dw\) is Brownian motion
Question: What is the relationship between \(x_t\) and \(x_0\)?
To answer this, we need to solve this SDE.
Solving the SDE via Integrating Factor¶
Step 1: Identify the SDE Structure¶
The VP-SDE has the form:
with \(a(t) = -\frac{1}{2}\beta(t)\) and \(b(t) = \sqrt{\beta(t)}\).
This is a linear SDE (the drift is linear in \(x\)), which can be solved using an integrating factor.
Step 2: Define the Integrating Factor¶
For a linear SDE with drift coefficient \(a(t)\), the integrating factor is:
In our case, \(a(t) = -\frac{1}{2}\beta(t)\), so:
Why this choice? The integrating factor is designed so that \(\frac{d\mu}{dt} = -a(t)\mu(t)\), which allows the drift term to cancel when we multiply through.
Step 3: Apply the Integrating Factor¶
Multiply both sides of the SDE by \(\mu(t)\):
Using Itô's lemma on \(\mu(t)x(t)\), we get:
Since \(d\mu = -a(t)\mu\,dt = \frac{1}{2}\beta(t)\mu\,dt\):
The drift terms cancel:
Step 4: Integrate¶
Integrate from \(0\) to \(t\):
Since \(\mu(0) = 1\):
Step 5: Solve for \(x(t)\)¶
The Emergence of \(\alpha(t)\)¶
Define:
This is where \(\alpha(t)\) comes from! It's the inverse of the integrating factor.
Now the solution becomes:
Physical meaning:
- \(\alpha(t)\) is the signal decay coefficient
- The term \(\alpha(t) x_0\) shows how the original signal scales over time
- As \(t\) increases and \(\beta(s) > 0\), \(\alpha(t)\) decreases toward 0
The Stochastic Integral: Computing the Variance¶
The stochastic integral \(\int_0^t \mu(s)\sqrt{\beta(s)}\,dw(s)\) is Gaussian with: - Mean: 0 - Variance: \(\int_0^t \mu(s)^2 \beta(s)\,ds\) (by Itô isometry)
Computing the Variance¶
Since \(\mu(s) = \exp\left(\frac{1}{2}\int_0^s \beta(u)\,du\right)\):
So:
Trick: Let \(\Phi(s) = \int_0^s \beta(u)\,du\). Then \(\frac{d\Phi}{ds} = \beta(s)\):
Since \(\Phi(t) = \int_0^t \beta(s)\,ds\) and \(\mu(t) = e^{\Phi(t)/2}\):
Variance of \(x(t)\)¶
The noise term in \(x(t)\) is:
Its variance is:
Since \(\alpha(t) = 1/\mu(t)\):
The Emergence of \(\bar{\alpha}_t\)¶
Define:
This is where \(\bar{\alpha}_t\) comes from! It's the square of the signal coefficient.
The Final Form¶
The solution becomes:
Or equivalently:
where:
- \(\sqrt{\bar{\alpha}_t} = \alpha(t)\) is the signal coefficient
- \(\sqrt{1-\bar{\alpha}_t}\) is the noise coefficient
Why These Definitions?¶
Not Arbitrary!¶
The definitions of \(\alpha(t)\) and \(\bar{\alpha}_t\) are not chosen arbitrarily. They emerge naturally from:
- The SDE structure: The VP-SDE \(dx = -\frac{1}{2}\beta(t)x\,dt + \sqrt{\beta(t)}\,dw\)
- The integrating factor technique: \(\mu(t) = \exp\left(\frac{1}{2}\int_0^t \beta(s)\,ds\right)\)
- The solution process: \(\alpha(t) = 1/\mu(t)\) and \(\bar{\alpha}_t = \alpha(t)^2\)
The Integrating Factor Connection¶
| Quantity | Definition | Origin |
|---|---|---|
| \(\mu(t)\) | \(\exp\left(\frac{1}{2}\int_0^t \beta(s)\,ds\right)\) | Integrating factor for SDE |
| \(\alpha(t)\) | \(1/\mu(t) = \exp\left(-\frac{1}{2}\int_0^t \beta(s)\,ds\right)\) | Inverse of integrating factor (signal coefficient) |
| \(\bar{\alpha}_t\) | \(\alpha(t)^2 = \exp\left(-\int_0^t \beta(s)\,ds\right)\) | Square of signal coefficient |
Key insight: The exponential with an integral in the exponent is exactly the integrating factor form from ODE/SDE theory.
Alternative: Starting from \(\bar{\alpha}_t\)¶
Some papers define \(\bar{\alpha}_t\) directly and derive \(\beta(t)\) from it.
Forward Approach (This Document)¶
Inverse Approach (Also Valid)¶
Derivation: From \(\bar{\alpha}_t = \exp\left(-\int_0^t \beta(s)\,ds\right)\), take the log:
Differentiate:
So:
Both approaches are equivalent—you can start with \(\beta(t)\) or \(\bar{\alpha}_t\).
Summary¶
| Definition | Formula | Origin |
|---|---|---|
| Integrating factor | \(\mu(t) = \exp\left(\frac{1}{2}\int_0^t \beta(s)\,ds\right)\) | Standard technique for linear SDEs |
| Signal coefficient | \(\alpha(t) = 1/\mu(t) = \exp\left(-\frac{1}{2}\int_0^t \beta(s)\,ds\right)\) | Inverse of integrating factor |
| Cumulative coefficient | \(\bar{\alpha}_t = \alpha(t)^2 = \exp\left(-\int_0^t \beta(s)\,ds\right)\) | Square of signal coefficient |
The key point: These definitions are not ad hoc. They arise naturally from solving the VP-SDE using the integrating factor technique, which is why they have integrals in the exponent.
References¶
- Forward Process Derivation:
docs/diffusion/forward_process_derivation.md— Complete derivation - Integrating Factor Technique:
docs/diffusion/integrating_factor.md— General method - Øksendal (2003): "Stochastic Differential Equations" — Chapter 5 on linear SDEs
- Ho et al. (2020): "Denoising Diffusion Probabilistic Models" — Uses these coefficients throughout