Chapter 2: RKHS Basis, Kernel Amplitudes, and Why GRL Doesn't Need Normalization¶
Motivation¶
In Chapter 1a, we learned that quantum mechanics represents states in Hilbert space, with wavefunctions as coordinate representations in a chosen basis.
In Chapter 1, we stated that GRL's reinforcement field \(Q^+ \in \mathcal{H}_k\) is analogous to a quantum state, with the field value \(Q^+(z)\) analogous to a wavefunction \(\psi(x)\).
This raises three fundamental questions:
- Basis: What is the "basis" in RKHS, and how is it chosen?
- Amplitudes: How do kernel evaluations relate to probability amplitudes?
- Probabilities: Does GRL need normalized probabilities like quantum mechanics?
This chapter answers all three and reveals a surprising insight: GRL combines QM's amplitude geometry with energy-based models' unnormalized inference.
1. The RKHS Basis: Kernel Sections¶
Quantum Mechanics Review¶
In QM, the position basis is:
Each \(|x\rangle\) is a basis vector representing "particle definitely at position \(x\)."
To get coordinates: Project the state \(|\psi\rangle\) onto a basis vector:
Key insight: Choosing a specific \(x\) doesn't define the whole basis—it selects one basis vector from the continuum.
RKHS Analogue¶
In GRL, the RKHS \(\mathcal{H}_k\) has an implicit basis (technically, a frame) given by kernel sections:
where \(\mathcal{Z} = \mathcal{S} \times \Theta\) is the augmented state-action space.
What is \(k(z, \cdot)\)?
For each point \(z \in \mathcal{Z}\), the function \(k(z, \cdot): \mathcal{Z} \to \mathbb{R}\) is an element of \(\mathcal{H}_k\).
Analogy:
| Quantum Mechanics | GRL (RKHS) |
|---|---|
| \(\|x\rangle\) = basis vector for position \(x\) | \(k(z, \cdot)\) = basis vector (frame element) for point \(z\) |
| Position basis: \(\{\|x\rangle : x \in \mathbb{R}\}\) | Kernel basis: \(\{k(z, \cdot) : z \in \mathcal{Z}\}\) |
| State: \(\|\psi\rangle \in \mathcal{H}\) | State: \(Q^+ \in \mathcal{H}_k\) |
| Wavefunction: \(\psi(x) = \langle x \| \psi \rangle\) | Field value: \(Q^+(z) = \langle Q^+, k(z, \cdot) \rangle_{\mathcal{H}_k}\) |
To Get Coordinates: Evaluate the Inner Product¶
Given the state \(Q^+ \in \mathcal{H}_k\) (determined by particles), to find its "coordinate" at query point \(z\):
What's happening:
- The state \(Q^+\) is already fixed (by the particle memory)
- Choosing \(z\) selects one frame element \(k(z, \cdot)\)
- The inner product gives the amplitude of \(Q^+\) along that direction
This is structurally identical to \(\psi(x) = \langle x | \psi \rangle\)!
So Is the Basis Determined by Choosing \(z\)?¶
Short answer: Yes—but locally, not globally.
Long answer:
The kernel-induced basis \(\{k(z, \cdot)\}\) exists for all \(z \in \mathcal{Z}\), just like the position basis exists for all \(x \in \mathbb{R}\).
When you choose a specific \(z\):
- You're not "defining the basis"
- You're selecting which basis element to project onto
Analogy: Asking "What is \(\psi(3.2)\)?" doesn't define the position basis—it just asks for the amplitude along the \(|3.2\rangle\) direction.
Same here: \(Q^+(z_0)\) asks for the amplitude along the \(k(z_0, \cdot)\) direction.
2. Kernel Amplitudes: Overlap Without Probabilities¶
Quantum Mechanics: Amplitudes → Probabilities¶
In QM:
Primitive: Complex amplitude \(\psi(x)\)
Derived: Probability via Born rule
Normalization:
Key point: Amplitude is fundamental; probability is derived by squaring and normalizing.
RKHS: Amplitudes Without Probabilities¶
In RKHS, a kernel evaluation:
already looks like an overlap amplitude!
But important differences:
| Property | Quantum Mechanics | RKHS / GRL |
|---|---|---|
| Values | Complex: \(\psi(x) \in \mathbb{C}\) | Real: \(Q^+(z) \in \mathbb{R}\) |
| Normalization | Required: \(\int \|\psi(x)\|^2 dx = 1\) | Not required |
| Interpretation | Probability amplitude | Score / value / energy |
Could you normalize? Yes! You could define:
This is mathematically valid—and in fact, Born machines and quantum-inspired generative models do exactly this.
But GRL Doesn't Need This Step¶
Why not?
Because GRL uses \(Q^+(z)\) for decision-making and control, not for sampling or probability estimation.
Policy inference:
Action selection:
Both use relative values, not normalized probabilities.
This brings us to the third question...
3. Energy-Based Models: Unnormalized Scores Are Enough¶
The EBM Perspective¶
Energy-based models (EBMs) define a probability distribution:
where \(Z = \int \exp(-E(x')) dx'\) is the partition function.
The key insight:
In practice, we never compute \(Z\)!
Why? Because inference, optimization, and control only require relative energies:
Comparing two options:
The partition function \(Z\) cancels out!
Optimization:
No normalization needed—just find the minimum.
Control:
Again, relative values are sufficient.
GRL's Position in the Landscape¶
GRL does the same thing:
The field \(Q^+(z)\) acts as an unnormalized score / negative energy:
Policy (Boltzmann distribution):
No explicit normalization is computed—the policy is normalized implicitly when actions are sampled or probabilities are queried.
Optimization (greedy action):
Only relative values matter.
Where Does GRL Sit?¶
| Framework | Primitive | Normalized? | Use Case |
|---|---|---|---|
| Quantum Mechanics | Amplitude \(\psi(x)\) | Yes (Born rule) | Predictions about measurements |
| Born Machines | Amplitude \(\psi(x)\) | Yes | Generative modeling |
| Energy-Based Models | Energy \(E(x)\) | No | Optimization, inference |
| GRL | Field \(Q^+(z)\) | No | Control, decision-making |
The key insight:
GRL borrows the amplitude geometry from quantum mechanics
and the unnormalized inference logic from energy-based models.
This combination keeps GRL mathematically clean without forcing probability normalization.
4. Three Interpretations, One Object¶
The reinforcement field \(Q^+\) can be viewed from three complementary perspectives:
Perspective 1: Hilbert Space State¶
A vector in the RKHS, determined by particle memory:
This is the abstract object—basis-independent.
Perspective 2: Amplitude Field (QM-like)¶
The coordinate representation in the kernel-induced basis.
This gives the field value at each point—the "wavefunction" of the field.
Perspective 3: Energy / Score Function (EBM-like)¶
An unnormalized score used for inference and control.
This enables decision-making without normalization.
Nothing is inconsistent here—they're just different readouts of the same state, like:
- Abstract vector \(\mathbf{v}\)
- Cartesian coordinates \([v_x, v_y, v_z]\)
- Potential energy \(U(\mathbf{v})\)
All describe the same object from different angles.
5. What GRL Does (and Doesn't) Claim¶
GRL Does NOT Claim:¶
- ❌ \(Q^+(z)\) is a probability
- ❌ Probabilities are derived via Born's rule \(p(z) \propto |Q^+(z)|^2\)
- ❌ Normalization is required for inference
GRL DOES Claim:¶
- ✅ \(Q^+\) is an element of a Hilbert space (RKHS)
- ✅ Kernel evaluations act as overlap amplitudes
- ✅ Action selection uses unnormalized scores (like EBMs)
- ✅ This unifies RKHS geometry, QM-style state representation, and EBM inference
A Precise Statement¶
Although kernel evaluations resemble probability amplitudes, GRL does not require normalized probabilities. Inference and control are performed via unnormalized energy-like scores, consistent with modern energy-based models.
This keeps the framework mathematically clean and modern.
6. Could We Define Probabilities? (Optional Extension)¶
Natural question: Since kernels have inner product structure like QM, can we define probabilities from them?
Answer: Yes! Probabilities could be defined in RKHS. There are multiple options:
Option 1: Born Rule on Kernel Amplitudes¶
This works! Born machines use this approach.
For GRL:
- Pro: Direct QM analogy
- Con: Requires computing the normalization integral
- Con: Real-valued amplitudes don't give interference effects
Option 2: Boltzmann Distribution (What GRL Uses)¶
This is what GRL implicitly uses for policy!
Advantages:
- Standard in RL (Boltzmann exploration)
- Temperature parameter \(\beta\) controls exploration
- No need to compute normalization explicitly (ratio trick)
Option 3: Complex-Valued RKHS (Advanced)¶
If you extend RKHS to complex values, you get:
- Probability amplitudes: \(\psi(z) \in \mathbb{C}\)
- Born rule: \(p(z) = |\psi(z)|^2\)
- Interference effects from phase!
This is exactly what you explore in 02-complex-rkhs.md!
7. Three Roles of "Choosing a Point"¶
Let's unify the three concepts by understanding what "choosing \(z\)" means in each context:
| Concept | Question Answered | Example |
|---|---|---|
| Basis selection | "Along which direction am I looking?" | \(k(z, \cdot)\) is the direction |
| Amplitude | "How much does the state align with that direction?" | \(Q^+(z) = \langle Q^+, k(z, \cdot) \rangle\) |
| Probability | "How often would I observe this if I sampled?" | \(p(z) \propto \exp(\beta Q^+(z))\) (optional) |
GRL needs the first two, not the third.
8. Why This Matters for Your Framework¶
Conceptual Clarity¶
Understanding that GRL works with unnormalized amplitudes (like EBMs) rather than normalized probabilities (like QM):
-
Explains why no partition function is needed
-
Justifies the QM analogy (geometry) without overcommitting (Born rule)
- Positions GRL correctly in the modern ML landscape (alongside EBMs, score-based models)
Future Extensions¶
If you want to develop normalized probability formulations:
Option A: Born rule approach (for generative modeling)
Option B: Complex RKHS (for interference and phase semantics)
Both are natural extensions explored in later chapters!
Practical Implications¶
For implementation:
- No need to compute partition functions
- Use relative values for action selection
- Policy normalization handled implicitly
For theory:
- Clean mathematical framework
- Rigorous Hilbert space structure
- Flexible enough for future probability extensions
Summary¶
Key Insights¶
-
RKHS Basis:
-
Kernel sections \(\{k(z, \cdot)\}\) form a continuous frame
- Choosing \(z\) selects one frame element, like choosing \(x\) selects \(|x\rangle\) in QM
-
The state \(Q^+\) exists independently; \(Q^+(z)\) is its projection onto \(k(z, \cdot)\)
-
Kernel Amplitudes:
-
\(Q^+(z)\) acts like a wavefunction (coordinate representation)
- But GRL doesn't require normalization (unlike QM)
-
Similar to EBMs working with unnormalized energies
-
Three Interpretations:
-
Abstract: \(Q^+ \in \mathcal{H}_k\) (Hilbert space state)
- Coordinate: \(Q^+(z)\) (amplitude field)
-
Inference: \(-Q^+(z)\) (energy score)
-
Why No Normalization:
-
Decision-making uses relative values
- Partition function cancels in ratios
-
Same principle as modern EBMs
-
Could Define Probabilities:
-
Born rule: \(p(z) \propto |Q^+(z)|^2\) (possible, not required)
- Boltzmann: \(\pi(\theta|s) \propto \exp(\beta Q^+(s,\theta))\) (what GRL uses)
- Complex RKHS: \(p(z) = |\psi(z)|^2\) with interference (future extension)
Key Equations¶
RKHS basis (frame):
State in RKHS:
Coordinate representation (amplitude):
Energy interpretation:
Policy (Boltzmann):
Analogy to QM:
| Quantum | RKHS |
|---|---|
| \(\|\psi\rangle \in \mathcal{H}\) | \(Q^+ \in \mathcal{H}_k\) |
| \(\psi(x) = \langle x \| \psi \rangle\) | \(Q^+(z) = \langle Q^+, k(z, \cdot) \rangle\) |
| Basis: \(\{\|x\rangle\}\) | Frame: \(\{k(z, \cdot)\}\) |
Further Reading¶
Within This Tutorial Series¶
- Chapter 1: RKHS-Quantum Structural Parallel
- Chapter 1a: What Is a Wavefunction?
- Chapter 3 (complex): Complex-Valued RKHS and Probability Amplitudes
Back to GRL Tutorials¶
- Tutorial Chapter 2: RKHS Foundations
- Tutorial Chapter 4: Reinforcement Field
Related Concepts¶
- Energy-Based Models: LeCun et al. (2006). "A Tutorial on Energy-Based Learning"
- Born Machines: Cheng et al. (2018). "Quantum Generative Adversarial Learning"
- RKHS and Kernel Methods: Berlinet & Thomas-Agnan (2004). "Reproducing Kernel Hilbert Spaces"
Last Updated: January 14, 2026