Skip to content

Flow Matching: A Comprehensive Guide

This directory contains comprehensive documentation on flow matching methods for generative modeling — an alternative to score-based diffusion that learns velocity fields via simple regression.

Flow matching offers faster sampling (10-50 steps vs 100-1000 for diffusion), simpler training (direct regression), and flexible paths, making it particularly promising for biological data applications.


Core Documentation Series

This series mirrors the structure of the DDPM documentation, providing a complete foundation for understanding and implementing flow matching models.

Document Description
01_flow_matching_foundations.md Foundations: Mathematical theory, forward/backward processes, theoretical properties
02_flow_matching_training.md Training: Loss functions, network architectures, training strategies, reflow
03_flow_matching_sampling.md Sampling: ODE solvers, sampling strategies, quality-speed tradeoffs
04_flow_matching_landscape.md Landscape: Normalizing flows vs flow matching, variants comparison, historical context
rectifying_flow.md Tutorial: Rectified flow from first principles (original tutorial)

Quick Navigation

For Beginners

  1. Start with Rectifying Flow Tutorial for intuitive introduction
  2. Read Foundations for mathematical details
  3. Review Training for implementation

For Implementation

  1. Training Guide — Complete training loop with code
  2. Sampling Guide — ODE solvers and sampling strategies
  3. Foundations — Reference for equations

For Comparison with Diffusion

  1. Foundations — Conceptual differences
  2. Sampling — Speed and quality comparison
  3. See DDPM Documentation for diffusion model details

Key Concepts

Flow Matching Overview

Flow matching learns a velocity field \(v_\theta(x, t)\) that transports samples from a noise distribution to a data distribution:

\[ \frac{dx}{dt} = v_\theta(x, t) \]

Key advantages:

  • Simpler training: Direct regression instead of score matching
  • Faster sampling: 10-50 steps (vs 100-1000 for diffusion)
  • Deterministic: Same noise → same output
  • Flexible: Not restricted to Gaussian noise schedules

Rectified Flow

Rectified flow is the simplest and most practical instantiation:

  • Path: Linear interpolation \(x_t = (1-t) x_0 + t x_1\)
  • Velocity: Constant \(v = x_1 - x_0\)
  • Loss: Simple MSE \(\|v_\theta(x_t, t) - (x_1 - x_0)\|^2\)
  • Sampling: Deterministic ODE integration

Reflow: Iteratively straighten paths for even faster sampling (5-10 steps).


Comparison with Diffusion Models

Aspect Score Matching (Diffusion) Flow Matching
What's learned Score: \(\nabla_x \log p_t(x)\) Velocity: \(v_\theta(x, t)\)
Forward process Stochastic (add noise) Deterministic (interpolate)
Reverse process Stochastic SDE or ODE Deterministic ODE
Training Score matching (complex) Simple regression
Sampling steps 100-1000 (SDE), 50-100 (ODE) 10-50 (ODE)
Speed Slower 2-5× faster
Noise schedule Critical design choice Less critical

When to use flow matching:

  • Faster sampling is critical
  • Simpler training preferred
  • Exploring new domains (biology, molecules)
  • Need deterministic generation

Learning Path

Conceptual Understanding

  1. Rectifying Flow Tutorial — Intuitive introduction
  2. Linear interpolation paths
  3. Velocity fields
  4. Why "rectified"?

  5. Foundations — Mathematical theory

  6. Probability flows
  7. Conditional flow matching
  8. Optimal transport connection

Practical Implementation

  1. Training — How to train
  2. Loss functions
  3. Network architectures (U-Net, DiT)
  4. Training strategies and best practices
  5. Reflow for faster sampling

  6. Sampling — How to sample

  7. ODE solvers (Euler, RK4, adaptive)
  8. Quality-speed tradeoffs
  9. Conditional generation and guidance

Advanced Topics

  1. Reflow — Iterative path straightening
  2. Conditional generation — Class/text conditioning
  3. Classifier-free guidance — Enhanced conditioning
  4. Domain adaptation — Biological data applications

Code Examples

Training

# Simple rectified flow training
for batch in dataloader:
    x0 = batch  # data
    x1 = torch.randn_like(x0)  # noise
    t = torch.rand(batch_size)

    xt = (1 - t) * x0 + t * x1
    target = x1 - x0

    pred = model(xt, t)
    loss = F.mse_loss(pred, target)
    loss.backward()

Sampling

# RK4 sampling (10-20 steps)
x = torch.randn(batch_size, *data_shape)
dt = 1.0 / num_steps

for i in range(num_steps):
    t = 1.0 - i * dt
    k1 = model(x, t)
    k2 = model(x - dt/2 * k1, t - dt/2)
    k3 = model(x - dt/2 * k2, t - dt/2)
    k4 = model(x - dt * k3, t - dt)
    x = x - dt/6 * (k1 + 2*k2 + 2*k3 + k4)

Within GenAI Lab

External Resources


Summary

Flow matching provides a simpler, faster alternative to diffusion models:

  • Training: Direct regression on velocity fields
  • Sampling: Deterministic ODE (10-50 steps)
  • Quality: Comparable to diffusion models
  • Speed: 2-5× faster than DDIM, 10-50× faster than DDPM

Rectified flow is the simplest instantiation, using linear interpolation paths and constant velocities. With reflow, sampling can be reduced to 5-10 steps while maintaining quality.

This makes flow matching particularly attractive for: - Real-time applications - Resource-constrained environments - Biological data generation (gene expression, molecules) - Domains requiring deterministic generation