Fokker-Planck Equation: Derivation and Intuition¶
Overview¶
The Fokker-Planck equation (also called the forward Kolmogorov equation) describes how the probability distribution of a stochastic process evolves over time. It is the bridge between individual particle dynamics (described by SDEs) and collective probability evolution.
This document derives the Fokker-Planck equation from first principles and explains why probability distributions must obey it.
Referenced From¶
- Main Document:
docs/diffusion/reverse_process/reverse_process_derivation.md— Uses the Fokker-Planck equation to derive the reverse SDE - Related:
notebooks/diffusion/02_sde_formulation/supplements/07_fokker_planck_equation.md
Table of Contents¶
- The Setting
- Intuitive Picture
- Derivation via Infinitesimal Evolution
- Kramers-Moyal Expansion (Rigorous)
- Physical Interpretation
- Connection to Conservation Laws
- Example: Simple Diffusion
- Why This Matters for Reverse SDEs
The Setting¶
Stochastic Differential Equation¶
Consider a \(d\)-dimensional stochastic process \(x(t) \in \mathbb{R}^d\) governed by:
where:
- \(f(x,t) \in \mathbb{R}^d\) is the drift (deterministic force)
- \(g(t) \in \mathbb{R}\) is the diffusion coefficient (noise amplitude)
- \(w(t) \in \mathbb{R}^d\) is standard Brownian motion
Probability Distribution¶
At each time \(t\), the random variable \(x(t)\) has a probability distribution:
where \(\int p_t(x)\,dx = 1\).
The Question¶
How does \(p_t(x)\) evolve over time?
The Fokker-Planck equation answers this:
Intuitive Picture¶
Before diving into the math, let's build intuition.
What Changes Probability at a Point?¶
Imagine a point \(x\) in space. The probability density \(p_t(x)\) at this point can change due to:
- Drift (advection): Particles drift from nearby points to \(x\), or from \(x\) to elsewhere
- Diffusion (spreading): Particles randomly wander in and out of \(x\)
These two mechanisms give rise to the two terms in the Fokker-Planck equation.
Analogy: Heat Equation¶
You can think of probability like heat: - Drift term: Like a wind blowing heat from one place to another - Diffusion term: Like heat spreading from hot to cold regions
The Fokker-Planck equation is essentially a heat equation with an additional drift/advection term.
Derivation via Infinitesimal Evolution¶
We'll derive the equation by considering how probability evolves over an infinitesimal time step \(\Delta t\).
Step 1: Chapman-Kolmogorov Equation¶
The probability at time \(t + \Delta t\) is related to the probability at time \(t\) by:
where \(p(x, t+\Delta t \mid x', t)\) is the transition probability from \(x'\) at time \(t\) to \(x\) at time \(t + \Delta t\).
Interpretation: To find the probability of being at \(x\) at time \(t + \Delta t\), we sum over all possible starting points \(x'\) at time \(t\), weighted by their probability.
Step 2: Taylor Expand the Left Side¶
Step 3: Understand the Transition Probability¶
For the SDE \(dx = f(x,t)\,dt + g(t)\,dw\), the change over \(\Delta t\) is:
where \(\xi \sim \mathcal{N}(0, I)\) is a standard normal random variable.
This means: - Mean displacement: \(\mathbb{E}[\Delta x \mid x] = f(x,t) \Delta t\) - Covariance: \(\text{Cov}[\Delta x \mid x] = g(t)^2 \Delta t \, I\)
Step 4: Express Transition Probability¶
For small \(\Delta t\), the transition from \(x'\) to \(x\) is approximately:
where \(*\) denotes convolution.
More precisely, using Gaussian approximation:
Step 5: Substitute into Chapman-Kolmogorov¶
Now we use a clever trick: expand around \(x' = x\) (nearby points contribute most).
Let \(\delta x = x - x'\), so \(x' = x - \delta x\):
Step 6: Taylor Expand \(p_t(x - \delta x)\)¶
Using index notation:
Step 7: Compute Moments of \(\delta x\)¶
From the transition probability, conditioned on starting at \(x' = x - \delta x \approx x\):
First moment:
$$
\mathbb{E}[\delta x] = f(x,t) \Delta t + O(\Delta t^2) $$
Second moment (each component):
$$
\mathbb{E}[\delta x_i \delta x_j] = \begin{cases} g(t)^2 \Delta t + f_i f_j \Delta t^2 & \text{if } i = j \ f_i f_j \Delta t^2 & \text{if } i \neq j \end{cases} $$
For small \(\Delta t\), the \(\Delta t^2\) terms are negligible, so:
where \(\delta_{ij}\) is the Kronecker delta.
Step 8: Substitute Moments¶
Taking expectations over \(\delta x\):
Substitute the moments:
Step 9: Rearrange and Take Limit¶
Divide by \(\Delta t\) and take \(\Delta t \to 0\):
Step 10: Vector Notation¶
Using \(\nabla \cdot\) for divergence and \(\nabla^2\) for Laplacian:
This is the Fokker-Planck equation!
Kramers-Moyal Expansion (Rigorous)¶
The derivation above can be made rigorous using the Kramers-Moyal expansion.
Jump Moments¶
Define the jump moments:
Kramers-Moyal Theorem¶
The evolution of the probability distribution is:
Fokker-Planck Approximation¶
For continuous diffusion processes (like those generated by SDEs with smooth coefficients), the Kramers-Moyal expansion terminates at \(n=2\):
This gives:
For Our SDE¶
From \(dx = f(x,t)\,dt + g(t)\,dw\):
Substituting:
This is the Fokker-Planck equation.
Physical Interpretation¶
Let's break down each term:
The Full Equation¶
Term 1: Rate of Change¶
Meaning: How fast probability density is changing at point \(x\) at time \(t\).
Term 2: Drift (Advection)¶
Meaning: Probability flux due to deterministic drift.
Expanded form:
- \(-f \cdot \nabla p_t\): Advection of probability along the drift field
- \(-p_t \nabla \cdot f\): Change in probability due to compression/expansion of the drift field
Physical picture: Probability "flows" along the drift field \(f(x,t)\), like wind blowing particles.
Term 3: Diffusion (Spreading)¶
Meaning: Probability spreads from high-density to low-density regions.
Sign convention:
- \(\nabla^2 p_t > 0\): Concave up → probability flows in (increases)
- \(\nabla^2 p_t < 0\): Concave down → probability flows out (decreases)
Physical picture: Random motion causes probability to diffuse, like heat spreading in a metal rod.
Connection to Conservation Laws¶
Continuity Equation¶
The Fokker-Planck equation can be written as a continuity equation:
where \(J\) is the probability current (flux):
Components:
- \(f p_t\): Drift current (probability flowing along drift)
- \(-\frac{1}{2}g(t)^2 \nabla p_t\): Diffusion current (Fick's law, probability flowing down gradients)
Conservation of Probability¶
Integrating over all space:
(using divergence theorem, assuming \(J \to 0\) at infinity).
Result: Total probability is conserved, as it must be!
Example: Simple Diffusion¶
Setup¶
Consider pure diffusion with no drift:
where \(\sigma\) is constant.
Fokker-Planck Equation¶
This is the heat equation!
Solution¶
Starting from a point mass \(p_0(x) = \delta(x - x_0)\), the solution is:
This is a Gaussian with: - Mean: \(x_0\) (stays at starting point) - Variance: \(\sigma^2 t\) (spreads linearly with time)
Verification: Substitute this solution into the heat equation — it works!
Example: Ornstein-Uhlenbeck Process¶
Setup¶
Consider the SDE:
where:
- \(-\theta x\): Drift toward origin (like a spring)
- \(\sigma\): Constant diffusion
Fokker-Planck Equation¶
Stationary Distribution¶
At equilibrium (\(\partial p / \partial t = 0\)):
Solving this ODE:
Interpretation: The drift pulls particles toward the origin, while diffusion spreads them out. The equilibrium is a balance between these forces.
Why This Matters for Reverse SDEs¶
Forward Process¶
The forward SDE:
generates a probability evolution governed by:
Reverse Process¶
To reverse the process, we need to find an SDE whose Fokker-Planck equation gives backward evolution:
(note the sign flips).
Key Insight¶
The Fokker-Planck equation can be rewritten using the score function \(\nabla \log p_t\):
This allows us to express the diffusion term in terms of the score, leading to the effective drift:
The reverse SDE uses this effective drift to reverse the probability evolution.
See: reverse_process_derivation.md for the full derivation.
Summary¶
What We Learned¶
- Fokker-Planck Equation: Describes how probability distributions evolve for stochastic processes
- Two Mechanisms: Drift (advection) and diffusion (spreading)
- Derivation: From infinitesimal evolution using Chapman-Kolmogorov and Taylor expansion
- Physical Meaning: Continuity equation for probability flux
- Connection to Reverse SDEs: The score function appears when rewriting the diffusion term
The Equation¶
Drift term: \(-\nabla \cdot (f p_t)\) — Probability flows along \(f\)
Diffusion term: \(\frac{1}{2}g^2 \nabla^2 p_t\) — Probability spreads
References¶
Classic Texts¶
- Risken (1989): "The Fokker-Planck Equation: Methods of Solution and Applications"
- Gardiner (2009): "Stochastic Methods: A Handbook for the Natural and Social Sciences"
- Øksendal (2003): "Stochastic Differential Equations: An Introduction with Applications"
Papers¶
- Fokker (1914): "Die mittlere Energie rotierender elektrischer Dipole im Strahlungsfeld" — Original work
- Planck (1917): "Über einen Satz der statistischen Dynamik und seine Erweiterung in der Quantentheorie"
- Kolmogorov (1931): "Über die analytischen Methoden in der Wahrscheinlichkeitsrechnung" — Forward equation
Related Documents¶
- Reverse Process:
reverse_process_derivation.md - Forward Process:
forward_process_derivation.md - Supplement in Notebook:
notebooks/diffusion/02_sde_formulation/supplements/07_fokker_planck_equation.md