Skip to content

Supplementary: The Flow Map and the Pushforward

This note supplements Flow Matching Foundations. It unpacks the flow map \(\phi_t\), the pushforward of a distribution, and how these two objects relate to the velocity field and the continuity equation.


The Flow Map

Given a velocity field \(v(x, t)\), each starting point \(x_0 \in \mathbb{R}^d\) traces a unique trajectory determined by the ODE:

\[ \dot{x}(t) = v(x(t),\, t), \qquad x(0) = x_0 \]

The flow map \(\phi_t : \mathbb{R}^d \to \mathbb{R}^d\) collects all of these trajectories into a single function:

\[ \phi_t(x_0) \;=\; x_0 + \int_0^t v\!\left(\phi_s(x_0),\, s\right) ds \]

In words: \(\phi_t(x_0)\) is the position at time \(t\) of a particle that started at \(x_0\) at time \(0\).

Rather than tracking one trajectory, \(\phi_t\) simultaneously deforms the entire space \(\mathbb{R}^d\) — it is a smooth map that warps every point forward in time.


Structural Properties

When \(v(x, t)\) is sufficiently smooth (Lipschitz in \(x\)), the Picard–Lindelöf theorem guarantees uniqueness of solutions, which implies:

Property Statement Intuition
Identity at \(t=0\) \(\phi_0 = \mathrm{id}\) No time has passed; nothing has moved
Composition \(\phi_{0 \to t} = \phi_{s \to t} \circ \phi_{0 \to s}\) Travel from \(0\) to \(t\) by going \(0 \to s\), then \(s \to t\)
Invertibility \(\phi_t^{-1}\) exists Run the ODE backward: \(\dot{x} = -v(x, t)\)
Diffeomorphism \(\phi_t\) is smooth and bijective Two distinct starting points can never collide

The no-crossing property deserves attention. Suppose two particles are at the same position \(x\) at time \(t\). From that moment on, both obey \(\dot{x} = v(x, t)\) with the same initial condition — so they must follow the same trajectory forever. Trajectories can only share a point if they are identical. This is why crossing trajectories in learned flows are costly: crossing indicates the marginal velocity field \(v_t(x)\) is averaging over conflicting directions at \(x\), which forces an ODE solver to take small, careful steps to resolve the ambiguity.


The Pushforward

The flow map acts on points. The pushforward extends this action to probability distributions.

If \(X_0 \sim p_0\), define \(X_t = \phi_t(X_0)\). The distribution of \(X_t\) is the pushforward of \(p_0\) under \(\phi_t\), written \({\phi_t}_{\,\#}\, p_0\):

\[ p_t \;=\; {\phi_t}_{\,\#}\, p_0 \qquad \Longleftrightarrow \qquad X_0 \sim p_0 \;\Rightarrow\; \phi_t(X_0) \sim p_t \]

The \(\#\) subscript reads "push forward through." The definition says: to sample from \(p_t\), draw from \(p_0\) and apply the flow map.

Concrete Formula via Change of Variables

When \(\phi_t\) is differentiable with an invertible Jacobian, the pushed-forward density can be written explicitly:

\[ p_t(y) \;=\; p_0\!\left(\phi_t^{-1}(y)\right) \cdot \left|\det \nabla_y \,\phi_t^{-1}(y)\right| \]

The two factors have clear roles:

  • \(p_0\!\left(\phi_t^{-1}(y)\right)\) — look up the probability of the source point that maps to \(y\)
  • \(\left|\det \nabla_y \,\phi_t^{-1}(y)\right|\) — correct for local volume change: if the flow compresses a region, density increases; if it stretches, density decreases

This is simply the standard change-of-variables formula from probability, applied to the flow map.


The Continuity Equation as the Infinitesimal Pushforward

The change-of-variables formula above is a finite-time statement: it tells you the density at time \(t\) given the density at time \(0\). The continuity equation is the infinitesimal version — it tells you how density changes over an infinitesimal step \(dt\):

\[ \frac{\partial p_t}{\partial t} + \nabla \cdot (p_t\, v) = 0 \]

To see the connection, differentiate the pushforward relation \(p_t = {\phi_t}_{\,\#}\, p_0\) with respect to \(t\). The result is the continuity equation. They encode exactly the same physics: probability is conserved as particles flow.


The Relationship Between All Three Objects

The velocity field, the flow map, and the continuity equation are three views of the same underlying structure:

Velocity field  v(x, t)
       │  Integrate the ODE:
       │  ẋ = v(x, t),  x(0) = x₀
Flow map  φ_t : ℝᵈ → ℝᵈ           ← acts on points
       │  Apply to every sample from p₀
       │  (change-of-variables formula)
Pushforward  (φ_t)# p₀ = p_t       ← acts on distributions
       │  Differentiate with respect to t
Continuity equation  ∂p_t/∂t + ∇·(p_t v) = 0

Each arrow is a precise mathematical operation:

  • ODE integration is the most computational step: given \(v\), numerically solve to get \(\phi_t\). An ODE solver does this at sampling time.
  • Pushforward is a distributional operation: no integration needed if you have \(\phi_t\); just transform samples.
  • Differentiation connects the finite-time picture (\(\phi_t\)) to the instantaneous picture (\(\partial_t p_t\)).

Flow matching operates at the top: learn \(v_\theta\). Everything else — the flow map, the transported distributions, the conserved probability — follows automatically by construction. The model never explicitly computes \(\phi_t\) during training; the ODE solver reconstructs it at inference.


Why This Matters for Flow Matching

The flow map perspective clarifies several design choices in flow matching:

Why straight paths are attractive. If the velocity field is constant, \(v(x, t) = c\), then \(\phi_t(x_0) = x_0 + tc\) — a straight line. Straight flow maps have identity-like Jacobians and are trivial for ODE solvers. Rectified flow's goal of "straightening paths" is precisely the goal of making \(\phi_t\) as close to a translation as possible.

Why reflow reduces sampling steps. Each reflow iteration makes trajectories more parallel and less curved. A less curved \(\phi_t\) requires fewer ODE solver steps to integrate accurately, because the velocity field changes less along each trajectory.

Why trajectory crossings are expensive. When two trajectories cross, the marginal velocity field \(v_t(x)\) at the crossing point is a compromise between two different directions. The flow map is still well-defined (trajectories only appear to cross in the marginal \(x\)-space, but come from different \(x_0\) values), but the network must learn a velocity that serves both — resulting in a curved effective trajectory that needs more solver steps.