Chapter 1: Core Concepts¶

Purpose: Introduce the fundamental building blocks of GRL
Prerequisites: Chapter 0 (Overview)
Key Concepts: Augmented state space, parametric actions, experience particles, kernel similarity

Introduction¶

In Chapter 0, we introduced the central idea of GRL: treating actions as parametric operators rather than fixed symbols. Now we'll formalize the core building blocks that make this possible:

Parametric Actions: How actions are represented as parameter vectors
Augmented State Space: The joint space of states and action parameters
Experience Particles: How we represent and store experience
Kernel Similarity: How we measure relationships between experiences

These concepts form the foundation on which the reinforcement field, algorithms, and policy inference are built.

1. Parametric Actions¶

From Symbols to Parameters¶

In traditional RL, an action $a$ is either:

A discrete symbol from a finite set: $a \in \{1, 2, ..., K\}$
A continuous vector from a bounded region: $a \in \mathbb{R}^d$

In GRL, we take the continuous view further. An action is represented by a parameter vector $\theta \in \Theta$ that specifies an operator:

\[ \theta \to \hat{O}(\theta) \]

The parameter space $\Theta$ is typically $\mathbb{R}^d$ for some dimension $d$, though it could be a more structured manifold.

Examples¶

Domain	Parameters $\theta$	Operator $\hat{O}(\theta)$
2D Navigation	$(F_x, F_y)$	Force vector applied to agent
Pendulum	$\tau$	Torque applied to joint
Portfolio	$(w_1, ..., w_n)$	Asset allocation weights
Image transformation	$(r, \theta, s)$	Rotation, angle, scale

Why Parameters Matter¶

By treating actions as parameters, we gain:

Continuity: Nearby parameters $\theta$ and $\theta'$ produce similar effects. This enables smooth generalization.

Differentiability: We can compute gradients of outcomes with respect to $\theta$.

Compositionality: Parameters can be structured (e.g., hierarchical) to enable compositional actions.

Interpretability: Parameters often have physical meaning (force magnitude, angle, etc.).

2. Augmented State Space¶

Combining States and Actions¶

The key insight of GRL is to reason about states and actions together as points in a unified space. We define the augmented state-action point:

\[ z = (s, \theta) \in \mathcal{Z} = \mathcal{S} \times \Theta \]

where:

$s \in \mathcal{S}$ is the environment state (possibly embedded/encoded)
$\theta \in \Theta$ is the action parameter vector
$\mathcal{Z}$ is the augmented space

Why Augment?¶

In standard RL, we might learn $Q(s, a)$ — the value of taking action $a$ in state $s$. In GRL, we learn $Q^+(z) = Q^+(s, \theta)$ — a value function over the entire augmented space.

This has several advantages:

Smooth Value Landscape: The value function is smooth over the continuous augmented space, enabling generalization.

Unified Representation: State and action are treated symmetrically, enabling richer representations.

Gradient Information: We can compute $\nabla_\theta Q^+(s, \theta)$ — how value changes with action parameters.

Embedding Functions¶

In practice, we often use embedding functions to transform raw states and actions into suitable representations:

$$

z = (x_s(s), x_a(\theta)) $$

where:

$x_s: \mathcal{S} \to \mathbb{R}^{d_s}$ embeds states
$x_a: \Theta \to \mathbb{R}^{d_a}$ embeds action parameters

These embeddings might be:

Identity (raw features)
Learned neural network encodings
Hand-crafted features

The choice of embedding affects the geometry of the augmented space and thus how similarity and generalization work.

3. Experience Particles¶

What is a Particle?¶

In GRL, experience is stored not as a replay buffer of transitions, but as a collection of particles in augmented space. Each particle is a tuple:

$$

\omega_i = (z_i, w_i) = ((s_i, \theta_i), w_i) $$

where:

$z_i = (s_i, \theta_i)$ is the location in augmented space
$w_i \in \mathbb{R}$ is the weight (typically related to value)

Particle Memory¶

The agent maintains a particle memory:

$$

\Omega = {(z_1, w_1), (z_2, w_2), ..., (z_N, w_N)} $$

This collection of weighted particles represents the agent's accumulated experience. It's analogous to:

A weighted sample approximation to a distribution
A nonparametric function representation
A memory of "what happened where and how good it was"

Particles vs. Replay Buffer¶

Replay Buffer	Particle Memory
Stores transitions $(s, a, r, s')$	Stores points $(z, w)$ in augmented space
Used for sampling and replaying	Used for function approximation and inference
Finite capacity, FIFO or priority	Dynamic, merging/pruning operations
Supports temporal learning	Supports spatial generalization

Particle Operations¶

The particle memory supports several operations:

Add: Insert a new particle $(z, w)$

Query: Evaluate the reinforcement field at a point $z$ using nearby particles

Merge: Combine similar particles to prevent unbounded growth

Prune: Remove low-influence particles

Update: Modify weights based on new reinforcement signals

These operations are formalized in Algorithm 1 (MemoryUpdate), covered in Chapter 6.

4. Kernel Similarity¶

The Role of Kernels¶

How do we determine which particles are "nearby" or "similar"? GRL uses kernel functions to define similarity in augmented space.

A kernel $k: \mathcal{Z} \times \mathcal{Z} \to \mathbb{R}$ measures how similar two points are:

\[ k(z, z') = k((s, \theta), (s', \theta')) \]

Higher values mean more similar.

Common Kernels¶

Radial Basis Function (RBF) / Gaussian:

$$

k(z, z') = \exp\left(-\frac{|z - z'|^2}{2\ell2}\right) $$

where $\ell$ is the lengthscale controlling how quickly similarity decays with distance.

Automatic Relevance Determination (ARD):

$$

k(z, z') = \exp\left(-\sum_{d=1}^{D} \frac{(z_d - z'_d)^2}{2\ell_d2}\right) $$

Each dimension has its own lengthscale $\ell_d$, allowing the kernel to learn which features matter most.

Composite Kernels:

For augmented space, we might use:

$$

k(z, z') = k_s(s, s') \cdot k_a(\theta, \theta') $$

where $k_s$ and $k_a$ are separate kernels for state and action components.

Why Kernels Matter¶

Kernels are central to GRL because they define:

Generalization: How experience at one point informs predictions at other points

Smoothness: How quickly the value function can change

Feature Relevance: Which dimensions of state/action matter (via ARD)

Geometry: The "shape" of the augmented space for learning

Kernel-Induced Function Representation¶

Given particles $\Omega = \{(z_i, w_i)\}$ and kernel $k$, we can define a function over the entire augmented space:

$$

f(z) = \sum_{i=1}^{N} w_i \, k(z, z_i) $$

This is the reinforcement field — a smooth function that assigns values to every point in augmented space based on the weighted contributions of all particles.

This representation:

Is nonparametric: No fixed neural network architecture
Is smooth: Inherits smoothness from the kernel
Generalizes: Points far from any particle get low values
Is adaptive: Adding particles reshapes the function

5. Putting It Together¶

Let's trace how these concepts connect:

1. Agent in State $s$¶

The agent observes state $s$ from the environment.

2. Consider Action Parameters¶

The agent considers action parameter $\theta$, forming augmented point $z = (s, \theta)$.

3. Query Particle Memory¶

Using the kernel, the agent computes the reinforcement field value:

$$

Q^+(z) = \sum_{i} w_i \, k(z, z_i) $$

4. Select Action¶

The agent selects action parameters that maximize $Q^+$ (or samples according to a policy).

5. Execute and Observe¶

The action is executed, reward $r$ is received, next state $s'$ is observed.

6. Update Particles¶

A new particle is added or existing particles are updated based on the experience.

7. Repeat¶

The cycle continues, with the reinforcement field evolving as particles accumulate.

Visual Intuition¶

Imagine a 2D augmented space where:

The x-axis represents some aspect of state (e.g., position)
The y-axis represents action parameter (e.g., force magnitude)

Each particle is a point in this 2D space with an associated weight (color/size indicating value).

The kernel defines how much each particle influences nearby points — like a Gaussian "bump" centered at each particle.

The reinforcement field is the sum of all these bumps — a smooth landscape over the entire space.

High regions: Good state-action combinations (high expected return)

Low regions: Poor combinations (low expected return)

Sparse regions: Uncertainty (few particles nearby)

Policy learning = navigating and reshaping this landscape.

Key Takeaways¶

Parametric actions represent actions as parameter vectors $\theta$ that specify operators
Augmented state space $\mathcal{Z} = \mathcal{S} \times \Theta$ combines state and action parameters
Experience particles $(z_i, w_i)$ are weighted points in augmented space representing experience
Kernel functions $k(z, z')$ define similarity and enable smooth generalization
The reinforcement field $Q^+(z) = \sum_i w_i k(z, z_i)$ emerges from particles and kernel
Together, these enable continuous action spaces, smooth generalization, and uncertainty quantification

Next Steps¶

In Chapter 2: RKHS Foundations, we'll explore:

What is a Reproducing Kernel Hilbert Space (RKHS)?
Why the reinforcement field lives in an RKHS
The mathematical properties that make this useful
Connection to Gaussian Processes

Last Updated: January 11, 2026

Domain	Parameters \(\theta\)	Operator \(\hat{O}(\theta)\)
2D Navigation	\((F_x, F_y)\)	Force vector applied to agent
Pendulum	\(\tau\)	Torque applied to joint
Portfolio	\((w_1, ..., w_n)\)	Asset allocation weights
Image transformation	\((r, \theta, s)\)	Rotation, angle, scale

Replay Buffer	Particle Memory
Stores transitions \((s, a, r, s')\)	Stores points \((z, w)\) in augmented space
Used for sampling and replaying	Used for function approximation and inference
Finite capacity, FIFO or priority	Dynamic, merging/pruning operations
Supports temporal learning	Supports spatial generalization

Chapter 1: Core Concepts¶

Introduction¶

1. Parametric Actions¶

From Symbols to Parameters¶

Examples¶

Why Parameters Matter¶

2. Augmented State Space¶

Combining States and Actions¶

Why Augment?¶

Embedding Functions¶

3. Experience Particles¶

What is a Particle?¶

Particle Memory¶

Particles vs. Replay Buffer¶

Particle Operations¶

4. Kernel Similarity¶

The Role of Kernels¶

Common Kernels¶

Why Kernels Matter¶

Kernel-Induced Function Representation¶

5. Putting It Together¶

1. Agent in State \(s\)¶

2. Consider Action Parameters¶

3. Query Particle Memory¶

4. Select Action¶

5. Execute and Observe¶

6. Update Particles¶

7. Repeat¶

Visual Intuition¶

Key Takeaways¶

Next Steps¶