Chapter 5: Particle Memory¶

Purpose: Understand how GRL represents and stores experience
Prerequisites: Chapter 4 (Reinforcement Field)
Key Concepts: Particles as functional basis, memory as belief state, kernel association, energy landscape

Introduction¶

In Chapters 1-4, we've built up the mathematical foundation:

Augmented space $(s, \theta)$
RKHS as the function space
Energy/fitness landscapes
The reinforcement field

Now we address a fundamental question: How does GRL store and use experience?

Traditional RL uses replay buffers — collections of transitions $(s, a, r, s')$ that are sampled for training. GRL takes a radically different approach: particle memory.

This chapter explains:

What particles represent mathematically
How particles create the reinforcement field
Why this is more than a replay buffer
The deep connection to belief states

1. What is a Particle?¶

The Basic Definition¶

A particle is a weighted point in augmented state-action space:

\[ \omega = (z, w) = ((s, \theta), w) \]

where:

$z = (s, \theta)$: Location in augmented space
$w \in \mathbb{R}$: Weight (influence on the field)

The Collection¶

The agent maintains a particle memory:

$$

\Omega = {(z_1, w_1), (z_2, w_2), \ldots, (z_N, w_N)} $$

This ensemble of weighted particles is the agent's complete experience representation.

2. Particles vs. Replay Buffer¶

Surface Similarity¶

Both store information about past experiences. But the similarities end there.

Deep Differences¶

Replay Buffer	Particle Memory
Stores transitions $(s, a, r, s')$	Stores weighted points $(z, w)$
Used for sampling mini-batches	Used for function representation
Purpose: re-experience past data	Purpose: construct energy landscape
Size: Fixed capacity, FIFO or priority	Size: Dynamic, with merging/pruning
Query: Random sample	Query: Kernel-weighted evaluation
Semantics: "What happened?"	Semantics: "What do I believe?"

Key insight: Replay buffer is a database. Particle memory is a belief state.

3. Particles as Basis Functions¶

The Mathematical Role¶

Recall the reinforcement field from Chapter 4:

\[ Q^+(z) = \sum_{i=1}^N w_i \, k(z, z_i) \]

Each particle $(z_i, w_i)$ contributes a "bump" to this landscape:

Location $z_i$: Where the bump is centered
Weight $w_i$: Amplitude and sign of the bump
Kernel $k(\cdot, z_i)$: Shape of the bump

Particles ARE the Function¶

The particles don't just influence the value function — they define it completely. The function exists only through the particles.

The particle memory is the value function in its nonparametric representation.

This is like how:

A polynomial is defined by its coefficients
A Fourier series is defined by its frequency components
A neural network is defined by its weights

Particles are the "parameters" of the reinforcement field — but they're not learned in the usual sense. They're accumulated from experience.

4. Memory as Belief State¶

Beyond Data Storage¶

Particle memory represents more than "what happened." It represents what the agent believes about:

Which configurations are valuable
How experience generalizes
What uncertainties exist

Three Types of Belief¶

1. Value Belief

"I believe configuration $z$ has value proportional to $Q^+(z)$."

This belief emerges from the weighted sum of particle contributions.

2. Structural Belief

"I believe similar configurations have similar values."

This belief is encoded in the kernel function $k(\cdot, \cdot)$.

3. Uncertainty Belief

"I am uncertain about regions far from any particle."

Where particles are sparse, the field is weak — signaling uncertainty.

5. How Particles Create the Landscape¶

Local Influence¶

Each particle creates a local "energy well" or "hill":

For positive weight $w_i > 0$ (good experience):

In fitness view: Creates a peak (desirable region)
In energy view: Creates a valley (attractor)

For negative weight $w_i < 0$ (bad experience):

In fitness view: Creates a valley (undesirable)
In energy view: Creates a hill (repeller)

Kernel Control¶

The kernel determines the range of influence:

\[ k(z, z_i) = \exp\left(-\frac{\|z - z_i\|^2}{2\ell^2}\right) \]

Small lengthscale $\ell$: Narrow, localized influence
Large lengthscale $\ell$: Broad, far-reaching influence

Superposition¶

The full landscape is the superposition of all particle contributions:

$$

Q^+(z) = \underbrace{w_1 k(z, z_1)}{\text{particle 1}} + \underbrace{w_2 k(z, z_2)} $$}} + \cdots + \underbrace{w_N k(z, z_N)}_{\text{particle N}

Multiple particles can reinforce each other (constructive interference) or cancel (destructive interference).

6. Particle Operations¶

Particle memory is not static. It supports dynamic operations:

6.1 Add¶

Insert a new particle $(z_{\text{new}}, w_{\text{new}})$ based on recent experience.

When: After each interaction with the environment

Effect: Immediately reshapes the reinforcement field

6.2 Query¶

Evaluate the reinforcement field at a point $z$:

$$

Q^+(z) = \sum_i w_i k(z, z_i) $$

When: During action selection or policy evaluation

Complexity: $O(N)$ — must sum over all particles

6.3 Merge¶

Combine similar particles to prevent unbounded growth.

When: Periodically or when memory exceeds capacity

Criterion: If $k(z_i, z_j) > \tau$, merge into single particle

Result: Reduces $N$ while preserving approximate field shape

6.4 Prune¶

Remove particles with low influence.

When: During memory management

Criterion: If $|w_i|$ is very small or particle is isolated

Result: Improves computational efficiency

6.5 Update¶

Modify particle weights based on new reinforcement signals.

When: During TD-style updates (RF-SARSA)

Effect: Reshapes the energy landscape to reflect new evidence

7. Memory as a Functional Ensemble¶

Not Points in Space¶

Particles are not just "samples from a distribution." They are:

Basis elements of a function-space representation

The Ensemble View¶

The particle ensemble collectively defines:

A value function
A belief state
An energy landscape
An implicit policy

Particle Interactions¶

Particles don't exist in isolation. They interact through:

Kernel overlap: Nearby particles reinforce or interfere

Collective influence: Far-away particles still contribute (though weakly)

Emergent structure: Clusters of particles create "basins" in the landscape

8. Connection to POMDP Belief States¶

The POMDP Perspective¶

In a Partially Observable Markov Decision Process (POMDP), the agent maintains a belief state $b(s)$ — a probability distribution over possible states.

GRL's particle memory plays an analogous role, but in function space rather than probability space.

The Mapping¶

POMDP Concept	GRL Analog
Belief state $b(s)$	Particle ensemble $\Omega$
Probability distribution	Energy functional $Q^+ \in \mathcal{H}_k$
Support of belief	Regions with particles
Belief update	MemoryUpdate algorithm
Action from belief	Query field, navigate landscape

Why This Matters¶

Viewing particle memory as a belief state explains why:

GRL naturally handles uncertainty
Exploration emerges from sparse particles
No explicit observation model is needed
The agent can reason under ambiguity

9. Memory Management¶

The Growth Problem¶

Without management, memory grows indefinitely:

New particles added at each step
$N$ increases linearly with experience
Query cost $O(N)$ becomes prohibitive

Management Strategies¶

1. Fixed Capacity

Maintain at most $N_{\max}$ particles. When full:

Merge similar particles
Prune low-weight particles
Replace least-influential particles

2. Adaptive Precision

Keep high resolution (many particles) in important regions, low resolution elsewhere.

3. Hierarchical Memory

Organize particles into clusters or tree structures for efficient queries.

4. Inducing Points

Select a subset of "representative" particles, approximate others.

10. Particle Semantics¶

What Does a Particle Mean?¶

A particle $(z_i, w_i)$ with:

$z_i = (s_i, \theta_i)$: "In state $s_i$, action $\theta_i$ was relevant"
$w_i > 0$: "This configuration led to positive reinforcement"
$w_i < 0$: "This configuration led to negative reinforcement"

What Does Kernel Similarity Mean?¶

$k(z, z_i) > 0.5$ means:

"Configuration $z$ is similar enough to $z_i$ that evidence from $z_i$ applies to $z$"
"The experience at $z_i$ generalizes to $z$"

What Does the Field Value Mean?¶

$Q^+(z)$ is:

The weighted evidence from all particles about configuration $z$
The agent's best guess of value at $z$
The fitness/energy level at that point in augmented space

Setup¶

State: $(x, y)$ position
Action: $(F_x, F_y)$ force
Augmented: $z = (x, y, F_x, F_y) \in \mathbb{R}^4$

Initial Particles¶

Particle	Location	Weight	Meaning
$\omega_1$	$(5, 5, 1, 0)$	$+10$	Good: move right near goal
$\omega_2$	$(2, 3, -1, 0)$	$-5$	Bad: moving left hit obstacle
$\omega_3$	$(0, 0, 0, 0.5)$	$+2$	OK: small upward force at start

The Field¶

At a new configuration $z = (4, 4, 0.8, 0.1)$:

\[ Q^+(z) = 10 \cdot k(z, z_1) - 5 \cdot k(z, z_2) + 2 \cdot k(z, z_3) \]

If $z$ is close to $z_1$ (near goal, rightward force), $k(z, z_1)$ is high, so $Q^+(z)$ will be positive (good).

If $z$ is far from all particles, all kernel values are small, so $Q^+(z) \approx 0$ (uncertain).

Policy Inference¶

To decide action in state $(4, 4)$:

Query field for various $\theta$: $Q^+((4, 4), \theta)$
Find $\theta$ that maximizes $Q^+$
Result: Choose action similar to $\theta_1 = (1, 0)$ (move right)

12. Summary¶

Core Concepts¶

Concept	Meaning
Particle $(z, w)$	Weighted point in augmented space
Particle memory $\Omega$	Ensemble representing belief state
Basis function role	Each particle is a kernel-centered function
Reinforcement field	Superposition of particle contributions
Memory operations	Add, query, merge, prune, update

Key Insights¶

Particles ARE the value function — they define $Q^+$ completely
Memory IS a belief state — not just data storage
Kernel similarity defines generalization — how experience spreads
Particles interact — creating emergent landscape structure
Dynamic memory — grows, shrinks, and reshapes over time

Why This Matters¶

Nonparametric: No fixed architecture, adapts to experience
Interpretable: Can visualize and understand particle contributions
Uncertainty-aware: Sparse particles = high uncertainty
Compositional: Particles can be organized hierarchically

Key Takeaways¶

Particle $(z, w)$: Weighted point in augmented space, a basis function
Memory $\Omega$: Ensemble of particles, defines the reinforcement field
Not a replay buffer: Memory is a functional representation, not data
Belief state: Particles encode what the agent believes about value
Dynamic operations: Add, query, merge, prune, update particles
Landscape creation: Particles collectively shape the energy landscape
POMDP connection: Memory plays the role of belief in function space

Next Steps¶

In Chapter 6: MemoryUpdate Algorithm, we'll explore:

How particles are added from new experiences
The four conceptual operations of MemoryUpdate
Why MemoryUpdate is a belief-state transition operator
Connection to Bayesian updating in RKHS

Last Updated: January 11, 2026

Particle	Location	Weight	Meaning
\(\omega_1\)	\((5, 5, 1, 0)\)	\(+10\)	Good: move right near goal
\(\omega_2\)	\((2, 3, -1, 0)\)	\(-5\)	Bad: moving left hit obstacle
\(\omega_3\)	\((0, 0, 0, 0.5)\)	\(+2\)	OK: small upward force at start

Replay Buffer	Particle Memory
Stores transitions \((s, a, r, s')\)	Stores weighted points \((z, w)\)
Used for sampling mini-batches	Used for function representation
Purpose: re-experience past data	Purpose: construct energy landscape
Size: Fixed capacity, FIFO or priority	Size: Dynamic, with merging/pruning
Query: Random sample	Query: Kernel-weighted evaluation
Semantics: "What happened?"	Semantics: "What do I believe?"

POMDP Concept	GRL Analog
Belief state \(b(s)\)	Particle ensemble \(\Omega\)
Probability distribution	Energy functional \(Q^+ \in \mathcal{H}_k\)
Support of belief	Regions with particles
Belief update	MemoryUpdate algorithm
Action from belief	Query field, navigate landscape

Concept	Meaning
Particle \((z, w)\)	Weighted point in augmented space
Particle memory \(\Omega\)	Ensemble representing belief state
Basis function role	Each particle is a kernel-centered function
Reinforcement field	Superposition of particle contributions
Memory operations	Add, query, merge, prune, update

Chapter 5: Particle Memory¶

Introduction¶

1. What is a Particle?¶

The Basic Definition¶

The Collection¶

2. Particles vs. Replay Buffer¶

Surface Similarity¶

Deep Differences¶

3. Particles as Basis Functions¶

The Mathematical Role¶

Particles ARE the Function¶

4. Memory as Belief State¶

Beyond Data Storage¶

Three Types of Belief¶

5. How Particles Create the Landscape¶

Local Influence¶

Kernel Control¶

Superposition¶

6. Particle Operations¶

6.1 Add¶

6.2 Query¶

6.3 Merge¶

6.4 Prune¶

6.5 Update¶

7. Memory as a Functional Ensemble¶

Not Points in Space¶

The Ensemble View¶

Particle Interactions¶

8. Connection to POMDP Belief States¶

The POMDP Perspective¶

The Mapping¶

Why This Matters¶

9. Memory Management¶

The Growth Problem¶

Management Strategies¶

10. Particle Semantics¶

What Does a Particle Mean?¶

What Does Kernel Similarity Mean?¶

What Does the Field Value Mean?¶

11. Example: 2D Navigation¶

Setup¶

Initial Particles¶

The Field¶

Policy Inference¶

12. Summary¶

Core Concepts¶

Key Insights¶

Why This Matters¶

Key Takeaways¶

Next Steps¶