Skip to content

Chapter 5: Particle Memory

Purpose: Understand how GRL represents and stores experience
Prerequisites: Chapter 4 (Reinforcement Field)
Key Concepts: Particles as functional basis, memory as belief state, kernel association, energy landscape


Introduction

In Chapters 1-4, we've built up the mathematical foundation:

  • Augmented space \((s, \theta)\)
  • RKHS as the function space
  • Energy/fitness landscapes
  • The reinforcement field

Now we address a fundamental question: How does GRL store and use experience?

Traditional RL uses replay buffers — collections of transitions \((s, a, r, s')\) that are sampled for training. GRL takes a radically different approach: particle memory.

This chapter explains:

  • What particles represent mathematically
  • How particles create the reinforcement field
  • Why this is more than a replay buffer
  • The deep connection to belief states

1. What is a Particle?

The Basic Definition

A particle is a weighted point in augmented state-action space:

\[ \omega = (z, w) = ((s, \theta), w) \]

where:

  • \(z = (s, \theta)\): Location in augmented space
  • \(w \in \mathbb{R}\): Weight (influence on the field)

The Collection

The agent maintains a particle memory:

$$

\Omega = {(z_1, w_1), (z_2, w_2), \ldots, (z_N, w_N)} $$

This ensemble of weighted particles is the agent's complete experience representation.


2. Particles vs. Replay Buffer

Surface Similarity

Both store information about past experiences. But the similarities end there.

Deep Differences

Replay Buffer Particle Memory
Stores transitions \((s, a, r, s')\) Stores weighted points \((z, w)\)
Used for sampling mini-batches Used for function representation
Purpose: re-experience past data Purpose: construct energy landscape
Size: Fixed capacity, FIFO or priority Size: Dynamic, with merging/pruning
Query: Random sample Query: Kernel-weighted evaluation
Semantics: "What happened?" Semantics: "What do I believe?"

Key insight: Replay buffer is a database. Particle memory is a belief state.


3. Particles as Basis Functions

The Mathematical Role

Recall the reinforcement field from Chapter 4:

\[ Q^+(z) = \sum_{i=1}^N w_i \, k(z, z_i) \]

Each particle \((z_i, w_i)\) contributes a "bump" to this landscape:

  • Location \(z_i\): Where the bump is centered
  • Weight \(w_i\): Amplitude and sign of the bump
  • Kernel \(k(\cdot, z_i)\): Shape of the bump

Particles ARE the Function

The particles don't just influence the value function — they define it completely. The function exists only through the particles.

The particle memory is the value function in its nonparametric representation.

This is like how:

  • A polynomial is defined by its coefficients
  • A Fourier series is defined by its frequency components
  • A neural network is defined by its weights

Particles are the "parameters" of the reinforcement field — but they're not learned in the usual sense. They're accumulated from experience.


4. Memory as Belief State

Beyond Data Storage

Particle memory represents more than "what happened." It represents what the agent believes about:

  • Which configurations are valuable
  • How experience generalizes
  • What uncertainties exist

Three Types of Belief

1. Value Belief

"I believe configuration \(z\) has value proportional to \(Q^+(z)\)."

This belief emerges from the weighted sum of particle contributions.

2. Structural Belief

"I believe similar configurations have similar values."

This belief is encoded in the kernel function \(k(\cdot, \cdot)\).

3. Uncertainty Belief

"I am uncertain about regions far from any particle."

Where particles are sparse, the field is weak — signaling uncertainty.


5. How Particles Create the Landscape

Local Influence

Each particle creates a local "energy well" or "hill":

For positive weight \(w_i > 0\) (good experience):

  • In fitness view: Creates a peak (desirable region)
  • In energy view: Creates a valley (attractor)

For negative weight \(w_i < 0\) (bad experience):

  • In fitness view: Creates a valley (undesirable)
  • In energy view: Creates a hill (repeller)

Kernel Control

The kernel determines the range of influence:

\[ k(z, z_i) = \exp\left(-\frac{\|z - z_i\|^2}{2\ell^2}\right) \]
  • Small lengthscale \(\ell\): Narrow, localized influence
  • Large lengthscale \(\ell\): Broad, far-reaching influence

Superposition

The full landscape is the superposition of all particle contributions:

$$

Q^+(z) = \underbrace{w_1 k(z, z_1)}{\text{particle 1}} + \underbrace{w_2 k(z, z_2)} $$}} + \cdots + \underbrace{w_N k(z, z_N)}_{\text{particle N}

Multiple particles can reinforce each other (constructive interference) or cancel (destructive interference).


6. Particle Operations

Particle memory is not static. It supports dynamic operations:

6.1 Add

Insert a new particle \((z_{\text{new}}, w_{\text{new}})\) based on recent experience.

When: After each interaction with the environment

Effect: Immediately reshapes the reinforcement field

6.2 Query

Evaluate the reinforcement field at a point \(z\):

$$

Q^+(z) = \sum_i w_i k(z, z_i) $$

When: During action selection or policy evaluation

Complexity: \(O(N)\) — must sum over all particles

6.3 Merge

Combine similar particles to prevent unbounded growth.

When: Periodically or when memory exceeds capacity

Criterion: If \(k(z_i, z_j) > \tau\), merge into single particle

Result: Reduces \(N\) while preserving approximate field shape

6.4 Prune

Remove particles with low influence.

When: During memory management

Criterion: If \(|w_i|\) is very small or particle is isolated

Result: Improves computational efficiency

6.5 Update

Modify particle weights based on new reinforcement signals.

When: During TD-style updates (RF-SARSA)

Effect: Reshapes the energy landscape to reflect new evidence


7. Memory as a Functional Ensemble

Not Points in Space

Particles are not just "samples from a distribution." They are:

Basis elements of a function-space representation

The Ensemble View

The particle ensemble collectively defines:

  • A value function
  • A belief state
  • An energy landscape
  • An implicit policy

Particle Interactions

Particles don't exist in isolation. They interact through:

Kernel overlap: Nearby particles reinforce or interfere

Collective influence: Far-away particles still contribute (though weakly)

Emergent structure: Clusters of particles create "basins" in the landscape


8. Connection to POMDP Belief States

The POMDP Perspective

In a Partially Observable Markov Decision Process (POMDP), the agent maintains a belief state \(b(s)\) — a probability distribution over possible states.

GRL's particle memory plays an analogous role, but in function space rather than probability space.

The Mapping

POMDP Concept GRL Analog
Belief state \(b(s)\) Particle ensemble \(\Omega\)
Probability distribution Energy functional \(Q^+ \in \mathcal{H}_k\)
Support of belief Regions with particles
Belief update MemoryUpdate algorithm
Action from belief Query field, navigate landscape

Why This Matters

Viewing particle memory as a belief state explains why:

  • GRL naturally handles uncertainty
  • Exploration emerges from sparse particles
  • No explicit observation model is needed
  • The agent can reason under ambiguity

9. Memory Management

The Growth Problem

Without management, memory grows indefinitely:

  • New particles added at each step
  • \(N\) increases linearly with experience
  • Query cost \(O(N)\) becomes prohibitive

Management Strategies

1. Fixed Capacity

Maintain at most \(N_{\max}\) particles. When full:

  • Merge similar particles
  • Prune low-weight particles
  • Replace least-influential particles

2. Adaptive Precision

Keep high resolution (many particles) in important regions, low resolution elsewhere.

3. Hierarchical Memory

Organize particles into clusters or tree structures for efficient queries.

4. Inducing Points

Select a subset of "representative" particles, approximate others.


10. Particle Semantics

What Does a Particle Mean?

A particle \((z_i, w_i)\) with:

  • \(z_i = (s_i, \theta_i)\): "In state \(s_i\), action \(\theta_i\) was relevant"
  • \(w_i > 0\): "This configuration led to positive reinforcement"
  • \(w_i < 0\): "This configuration led to negative reinforcement"

What Does Kernel Similarity Mean?

\(k(z, z_i) > 0.5\) means:

  • "Configuration \(z\) is similar enough to \(z_i\) that evidence from \(z_i\) applies to \(z\)"
  • "The experience at \(z_i\) generalizes to \(z\)"

What Does the Field Value Mean?

\(Q^+(z)\) is:

  • The weighted evidence from all particles about configuration \(z\)
  • The agent's best guess of value at \(z\)
  • The fitness/energy level at that point in augmented space

11. Example: 2D Navigation

Setup

  • State: \((x, y)\) position
  • Action: \((F_x, F_y)\) force
  • Augmented: \(z = (x, y, F_x, F_y) \in \mathbb{R}^4\)

Initial Particles

Particle Location Weight Meaning
\(\omega_1\) \((5, 5, 1, 0)\) \(+10\) Good: move right near goal
\(\omega_2\) \((2, 3, -1, 0)\) \(-5\) Bad: moving left hit obstacle
\(\omega_3\) \((0, 0, 0, 0.5)\) \(+2\) OK: small upward force at start

The Field

At a new configuration \(z = (4, 4, 0.8, 0.1)\):

\[ Q^+(z) = 10 \cdot k(z, z_1) - 5 \cdot k(z, z_2) + 2 \cdot k(z, z_3) \]

If \(z\) is close to \(z_1\) (near goal, rightward force), \(k(z, z_1)\) is high, so \(Q^+(z)\) will be positive (good).

If \(z\) is far from all particles, all kernel values are small, so \(Q^+(z) \approx 0\) (uncertain).

Policy Inference

To decide action in state \((4, 4)\):

  • Query field for various \(\theta\): \(Q^+((4, 4), \theta)\)
  • Find \(\theta\) that maximizes \(Q^+\)
  • Result: Choose action similar to \(\theta_1 = (1, 0)\) (move right)

12. Summary

Core Concepts

Concept Meaning
Particle \((z, w)\) Weighted point in augmented space
Particle memory \(\Omega\) Ensemble representing belief state
Basis function role Each particle is a kernel-centered function
Reinforcement field Superposition of particle contributions
Memory operations Add, query, merge, prune, update

Key Insights

  1. Particles ARE the value function — they define \(Q^+\) completely
  2. Memory IS a belief state — not just data storage
  3. Kernel similarity defines generalization — how experience spreads
  4. Particles interact — creating emergent landscape structure
  5. Dynamic memory — grows, shrinks, and reshapes over time

Why This Matters

  • Nonparametric: No fixed architecture, adapts to experience
  • Interpretable: Can visualize and understand particle contributions
  • Uncertainty-aware: Sparse particles = high uncertainty
  • Compositional: Particles can be organized hierarchically

Key Takeaways

  1. Particle \((z, w)\): Weighted point in augmented space, a basis function
  2. Memory \(\Omega\): Ensemble of particles, defines the reinforcement field
  3. Not a replay buffer: Memory is a functional representation, not data
  4. Belief state: Particles encode what the agent believes about value
  5. Dynamic operations: Add, query, merge, prune, update particles
  6. Landscape creation: Particles collectively shape the energy landscape
  7. POMDP connection: Memory plays the role of belief in function space

Next Steps

In Chapter 6: MemoryUpdate Algorithm, we'll explore:

  • How particles are added from new experiences
  • The four conceptual operations of MemoryUpdate
  • Why MemoryUpdate is a belief-state transition operator
  • Connection to Bayesian updating in RKHS

Related: Chapter 4: Reinforcement Field, Chapter 6: MemoryUpdate


Last Updated: January 11, 2026