Chapter 5: Particle Memory¶
Purpose: Understand how GRL represents and stores experience
Prerequisites: Chapter 4 (Reinforcement Field)
Key Concepts: Particles as functional basis, memory as belief state, kernel association, energy landscape
Introduction¶
In Chapters 1-4, we've built up the mathematical foundation:
- Augmented space \((s, \theta)\)
- RKHS as the function space
- Energy/fitness landscapes
- The reinforcement field
Now we address a fundamental question: How does GRL store and use experience?
Traditional RL uses replay buffers — collections of transitions \((s, a, r, s')\) that are sampled for training. GRL takes a radically different approach: particle memory.
This chapter explains:
- What particles represent mathematically
- How particles create the reinforcement field
- Why this is more than a replay buffer
- The deep connection to belief states
1. What is a Particle?¶
The Basic Definition¶
A particle is a weighted point in augmented state-action space:
where:
- \(z = (s, \theta)\): Location in augmented space
- \(w \in \mathbb{R}\): Weight (influence on the field)
The Collection¶
The agent maintains a particle memory:
$$
\Omega = {(z_1, w_1), (z_2, w_2), \ldots, (z_N, w_N)} $$
This ensemble of weighted particles is the agent's complete experience representation.
2. Particles vs. Replay Buffer¶
Surface Similarity¶
Both store information about past experiences. But the similarities end there.
Deep Differences¶
| Replay Buffer | Particle Memory |
|---|---|
| Stores transitions \((s, a, r, s')\) | Stores weighted points \((z, w)\) |
| Used for sampling mini-batches | Used for function representation |
| Purpose: re-experience past data | Purpose: construct energy landscape |
| Size: Fixed capacity, FIFO or priority | Size: Dynamic, with merging/pruning |
| Query: Random sample | Query: Kernel-weighted evaluation |
| Semantics: "What happened?" | Semantics: "What do I believe?" |
Key insight: Replay buffer is a database. Particle memory is a belief state.
3. Particles as Basis Functions¶
The Mathematical Role¶
Recall the reinforcement field from Chapter 4:
Each particle \((z_i, w_i)\) contributes a "bump" to this landscape:
- Location \(z_i\): Where the bump is centered
- Weight \(w_i\): Amplitude and sign of the bump
- Kernel \(k(\cdot, z_i)\): Shape of the bump
Particles ARE the Function¶
The particles don't just influence the value function — they define it completely. The function exists only through the particles.
The particle memory is the value function in its nonparametric representation.
This is like how:
- A polynomial is defined by its coefficients
- A Fourier series is defined by its frequency components
- A neural network is defined by its weights
Particles are the "parameters" of the reinforcement field — but they're not learned in the usual sense. They're accumulated from experience.
4. Memory as Belief State¶
Beyond Data Storage¶
Particle memory represents more than "what happened." It represents what the agent believes about:
- Which configurations are valuable
- How experience generalizes
- What uncertainties exist
Three Types of Belief¶
1. Value Belief
"I believe configuration \(z\) has value proportional to \(Q^+(z)\)."
This belief emerges from the weighted sum of particle contributions.
2. Structural Belief
"I believe similar configurations have similar values."
This belief is encoded in the kernel function \(k(\cdot, \cdot)\).
3. Uncertainty Belief
"I am uncertain about regions far from any particle."
Where particles are sparse, the field is weak — signaling uncertainty.
5. How Particles Create the Landscape¶
Local Influence¶
Each particle creates a local "energy well" or "hill":
For positive weight \(w_i > 0\) (good experience):
- In fitness view: Creates a peak (desirable region)
- In energy view: Creates a valley (attractor)
For negative weight \(w_i < 0\) (bad experience):
- In fitness view: Creates a valley (undesirable)
- In energy view: Creates a hill (repeller)
Kernel Control¶
The kernel determines the range of influence:
- Small lengthscale \(\ell\): Narrow, localized influence
- Large lengthscale \(\ell\): Broad, far-reaching influence
Superposition¶
The full landscape is the superposition of all particle contributions:
$$
Q^+(z) = \underbrace{w_1 k(z, z_1)}{\text{particle 1}} + \underbrace{w_2 k(z, z_2)} $$}} + \cdots + \underbrace{w_N k(z, z_N)}_{\text{particle N}
Multiple particles can reinforce each other (constructive interference) or cancel (destructive interference).
6. Particle Operations¶
Particle memory is not static. It supports dynamic operations:
6.1 Add¶
Insert a new particle \((z_{\text{new}}, w_{\text{new}})\) based on recent experience.
When: After each interaction with the environment
Effect: Immediately reshapes the reinforcement field
6.2 Query¶
Evaluate the reinforcement field at a point \(z\):
$$
Q^+(z) = \sum_i w_i k(z, z_i) $$
When: During action selection or policy evaluation
Complexity: \(O(N)\) — must sum over all particles
6.3 Merge¶
Combine similar particles to prevent unbounded growth.
When: Periodically or when memory exceeds capacity
Criterion: If \(k(z_i, z_j) > \tau\), merge into single particle
Result: Reduces \(N\) while preserving approximate field shape
6.4 Prune¶
Remove particles with low influence.
When: During memory management
Criterion: If \(|w_i|\) is very small or particle is isolated
Result: Improves computational efficiency
6.5 Update¶
Modify particle weights based on new reinforcement signals.
When: During TD-style updates (RF-SARSA)
Effect: Reshapes the energy landscape to reflect new evidence
7. Memory as a Functional Ensemble¶
Not Points in Space¶
Particles are not just "samples from a distribution." They are:
Basis elements of a function-space representation
The Ensemble View¶
The particle ensemble collectively defines:
- A value function
- A belief state
- An energy landscape
- An implicit policy
Particle Interactions¶
Particles don't exist in isolation. They interact through:
Kernel overlap: Nearby particles reinforce or interfere
Collective influence: Far-away particles still contribute (though weakly)
Emergent structure: Clusters of particles create "basins" in the landscape
8. Connection to POMDP Belief States¶
The POMDP Perspective¶
In a Partially Observable Markov Decision Process (POMDP), the agent maintains a belief state \(b(s)\) — a probability distribution over possible states.
GRL's particle memory plays an analogous role, but in function space rather than probability space.
The Mapping¶
| POMDP Concept | GRL Analog |
|---|---|
| Belief state \(b(s)\) | Particle ensemble \(\Omega\) |
| Probability distribution | Energy functional \(Q^+ \in \mathcal{H}_k\) |
| Support of belief | Regions with particles |
| Belief update | MemoryUpdate algorithm |
| Action from belief | Query field, navigate landscape |
Why This Matters¶
Viewing particle memory as a belief state explains why:
- GRL naturally handles uncertainty
- Exploration emerges from sparse particles
- No explicit observation model is needed
- The agent can reason under ambiguity
9. Memory Management¶
The Growth Problem¶
Without management, memory grows indefinitely:
- New particles added at each step
- \(N\) increases linearly with experience
- Query cost \(O(N)\) becomes prohibitive
Management Strategies¶
1. Fixed Capacity
Maintain at most \(N_{\max}\) particles. When full:
- Merge similar particles
- Prune low-weight particles
- Replace least-influential particles
2. Adaptive Precision
Keep high resolution (many particles) in important regions, low resolution elsewhere.
3. Hierarchical Memory
Organize particles into clusters or tree structures for efficient queries.
4. Inducing Points
Select a subset of "representative" particles, approximate others.
10. Particle Semantics¶
What Does a Particle Mean?¶
A particle \((z_i, w_i)\) with:
- \(z_i = (s_i, \theta_i)\): "In state \(s_i\), action \(\theta_i\) was relevant"
- \(w_i > 0\): "This configuration led to positive reinforcement"
- \(w_i < 0\): "This configuration led to negative reinforcement"
What Does Kernel Similarity Mean?¶
\(k(z, z_i) > 0.5\) means:
- "Configuration \(z\) is similar enough to \(z_i\) that evidence from \(z_i\) applies to \(z\)"
- "The experience at \(z_i\) generalizes to \(z\)"
What Does the Field Value Mean?¶
\(Q^+(z)\) is:
- The weighted evidence from all particles about configuration \(z\)
- The agent's best guess of value at \(z\)
- The fitness/energy level at that point in augmented space
11. Example: 2D Navigation¶
Setup¶
- State: \((x, y)\) position
- Action: \((F_x, F_y)\) force
- Augmented: \(z = (x, y, F_x, F_y) \in \mathbb{R}^4\)
Initial Particles¶
| Particle | Location | Weight | Meaning |
|---|---|---|---|
| \(\omega_1\) | \((5, 5, 1, 0)\) | \(+10\) | Good: move right near goal |
| \(\omega_2\) | \((2, 3, -1, 0)\) | \(-5\) | Bad: moving left hit obstacle |
| \(\omega_3\) | \((0, 0, 0, 0.5)\) | \(+2\) | OK: small upward force at start |
The Field¶
At a new configuration \(z = (4, 4, 0.8, 0.1)\):
If \(z\) is close to \(z_1\) (near goal, rightward force), \(k(z, z_1)\) is high, so \(Q^+(z)\) will be positive (good).
If \(z\) is far from all particles, all kernel values are small, so \(Q^+(z) \approx 0\) (uncertain).
Policy Inference¶
To decide action in state \((4, 4)\):
- Query field for various \(\theta\): \(Q^+((4, 4), \theta)\)
- Find \(\theta\) that maximizes \(Q^+\)
- Result: Choose action similar to \(\theta_1 = (1, 0)\) (move right)
12. Summary¶
Core Concepts¶
| Concept | Meaning |
|---|---|
| Particle \((z, w)\) | Weighted point in augmented space |
| Particle memory \(\Omega\) | Ensemble representing belief state |
| Basis function role | Each particle is a kernel-centered function |
| Reinforcement field | Superposition of particle contributions |
| Memory operations | Add, query, merge, prune, update |
Key Insights¶
- Particles ARE the value function — they define \(Q^+\) completely
- Memory IS a belief state — not just data storage
- Kernel similarity defines generalization — how experience spreads
- Particles interact — creating emergent landscape structure
- Dynamic memory — grows, shrinks, and reshapes over time
Why This Matters¶
- Nonparametric: No fixed architecture, adapts to experience
- Interpretable: Can visualize and understand particle contributions
- Uncertainty-aware: Sparse particles = high uncertainty
- Compositional: Particles can be organized hierarchically
Key Takeaways¶
- Particle \((z, w)\): Weighted point in augmented space, a basis function
- Memory \(\Omega\): Ensemble of particles, defines the reinforcement field
- Not a replay buffer: Memory is a functional representation, not data
- Belief state: Particles encode what the agent believes about value
- Dynamic operations: Add, query, merge, prune, update particles
- Landscape creation: Particles collectively shape the energy landscape
- POMDP connection: Memory plays the role of belief in function space
Next Steps¶
In Chapter 6: MemoryUpdate Algorithm, we'll explore:
- How particles are added from new experiences
- The four conceptual operations of MemoryUpdate
- Why MemoryUpdate is a belief-state transition operator
- Connection to Bayesian updating in RKHS
Related: Chapter 4: Reinforcement Field, Chapter 6: MemoryUpdate
Last Updated: January 11, 2026