GRL: Generalized Reinforcement Learning¶
Actions as Operators on State Space
π― What is GRL?¶
Generalized Reinforcement Learning (GRL) redefines the concept of "action" in reinforcement learning. Instead of treating actions as discrete indices or fixed-dimensional vectors, GRL models actions as parametric operators that transform the state space.
flowchart TB
subgraph TRL["π΅ Traditional RL"]
direction LR
S1["<b>State</b><br/>s"] --> P1["<b>Policy</b><br/>Ο"]
P1 --> A1["<b>Action Symbol</b><br/>a β A"]
A1 --> NS1["<b>Next State</b><br/>s'"]
end
TRL --> GRL
subgraph GRL["β¨ Generalized RL"]
direction LR
S2["<b>State</b><br/>s"] --> P2["<b>Policy</b><br/>Ο"]
P2 --> AP["<b>Operator Params</b><br/>ΞΈ"]
AP --> OP["<b>Operator</b><br/>Γ<sub>ΞΈ</sub>"]
OP --> ST["<b>State Transform</b><br/>s' = Γ<sub>ΞΈ</sub>(s)"]
end
style S1 fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style NS1 fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style A1 fill:#fff9c4,stroke:#f57c00,stroke-width:3px,color:#000
style P1 fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#000
style S2 fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style ST fill:#c8e6c9,stroke:#388e3c,stroke-width:3px,color:#000
style AP fill:#fff59d,stroke:#fbc02d,stroke-width:3px,color:#000
style OP fill:#ffcc80,stroke:#f57c00,stroke-width:3px,color:#000
style P2 fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#000
style TRL fill:#fafafa,stroke:#666,stroke-width:2px
style GRL fill:#fafafa,stroke:#666,stroke-width:2px
linkStyle 4 stroke:#666,stroke-width:2px
This formulation, inspired by the least-action principle in physics, leads to policies that are not only optimal but also physically groundedβpreferring smooth, efficient transformations over abrupt changes.
π Tutorial Papers¶
Part I: Reinforcement Fields β Particle-Based Learning¶
Status: π In progress (9/10 chapters complete)
Particle-based belief representation, energy landscapes, and functional learning over augmented state-action space.
Start Learning β | Research Roadmap β
| Section | Chapters | Topics |
|---|---|---|
| Foundations | 0, 1, 2, 3 | Augmented space, particles, RKHS, energy |
| Field & Memory | 4, 4a, 5, 6, 6a | Functional fields, Riesz theorem, belief states, MemoryUpdate, advanced memory |
| Algorithms | 7 | RF-SARSA (next) |
| Interpretation | 8-10 | Soft transitions, POMDP, synthesis |
Part II: Reinforcement Fields β Emergent Structure & Spectral Abstraction¶
Status: π Planned (after Part I)
Spectral discovery of hierarchical concepts through functional clustering in RKHS.
| Section | Chapters | Topics |
|---|---|---|
| Functional Clustering | 11 | Clustering in function space |
| Spectral Concepts | 12 | Concepts as eigenmodes |
| Hierarchical Control | 13 | Multi-level abstraction |
Based on: Section V of the original paper
Reading time: ~10 hours total (both parts)
Quantum-Inspired Extensions¶
Status: π¬ Advanced topics (9 chapters complete)
Mathematical connections to quantum mechanics and novel probability formulations for ML.
| Theme | Chapters | Topics |
|---|---|---|
| Foundations | 01, 01a, 02 | RKHS-QM parallel, state vs. wavefunction, amplitude interpretation |
| Complex RKHS | 03 | Complex-valued kernels, interference, phase semantics |
| Projections | 04, 05, 06 | Action/state fields, concept subspaces, belief dynamics |
| Learning & Memory | 07, 08 | Beyond GP, memory dynamics, principled consolidation |
Novel Contributions:
- Amplitude-based RL: Complex-valued value functions with phase semantics
- MDL consolidation: Information-theoretic memory management
- Concept-based MoE: Hierarchical RL via subspace projections
π Key Innovations¶
| Aspect | Classical RL | GRL |
|---|---|---|
| Action | Discrete index or vector | Parametric operator \(\hat{O}(\theta)\) |
| Action Space | Finite or bounded | Continuous manifold |
| Value Function | \(Q(s, a)\) | Reinforcement field \(Q^+(s, \theta)\) over augmented space |
| Experience | Replay buffer | Particle memory in RKHS |
| Policy | Learned function | Inferred from energy landscape |
| Uncertainty | External (dropout, ensembles) | Emergent from particle sparsity |
GRL as a Unifying Framework¶
Key Insight: Traditional RL algorithms (Q-learning, DQN, PPO, SAC, RLHF for LLMs) are special cases of GRL!
When you:
- Discretize actions β GRL recovers Q-learning
- Use neural networks β GRL recovers DQN
- Apply Boltzmann policies β GRL recovers REINFORCE/Actor-Critic
- Fine-tune LLMs β GRL generalizes RLHF
See: Recovering Classical RL from GRL β
Why GRL?¶
- Generalization: Subsumes existing methods as special cases
- Continuous actions: No discretization, full precision
- Smooth interpolation: Nearby parameters β similar behavior
- Compositional: Operators can be composed (operator algebra)
- Uncertainty: Sparse particles = high uncertainty (no ensembles needed)
- Interpretability: Energy landscapes, particle inspection
- Modern applications: Applies to RLHF, prompt optimization, neural architecture search
π Quick Start¶
Installation¶
# Clone the repository
git clone https://github.com/pleiadian53/GRL.git
cd GRL
# Create environment with mamba/conda
mamba env create -f environment.yml
mamba activate grl
# Install in development mode
pip install -e .
# Verify installation (auto-detects CPU/GPU/MPS)
python scripts/verify_installation.py
Installation instructions coming soon.
First Steps¶
- Read the tutorial: Start with Chapter 0: Overview
- Explore concepts: Work through Chapter 1: Core Concepts
- Understand algorithms: See the algorithm chapters (coming soon)
- Implement: Follow the implementation guide
π Project Structure¶
GRL/
βββ src/grl/ # Core library
β βββ core/ # Particle memory, kernels
β βββ algorithms/ # MemoryUpdate, RF-SARSA
β βββ envs/ # Environments
β βββ visualization/ # Plotting tools
βββ docs/ # π Public documentation
β βββ GRL0/ # Tutorial paper (Reinforcement Fields)
β βββ tutorials/ # Tutorial chapters (6/10 complete)
β βββ paper/ # Paper-ready sections
β βββ implementation/ # Implementation specs
βββ notebooks/ # Jupyter notebooks
β βββ vector_field.ipynb # Vector field demonstrations
βββ examples/ # Runnable examples
βββ scripts/ # Utility scripts
βββ tests/ # Unit tests
βββ configs/ # Configuration files
π Documentation¶
Tutorial Papers: Reinforcement Fields (Two Parts)¶
Part I: Particle-Based Learning (6/10 chapters complete)
- Start Here β Overview
- Tutorials β Chapter-by-chapter learning
- Implementation β Technical specifications
Part II: Emergent Structure & Spectral Abstraction (Planned)
Additional Resources¶
- Implementation Guide β Technical specifications
- Research Roadmap β Future directions
π¬ Research Papers¶
Original Paper (arXiv 2022)¶
Po-Hsiang Chiu, Manfred Huber
arXiv:2208.04822 (2022) β 37 pages, 15 figures
The foundational work introducing particle-based belief states, reinforcement fields, and concept-driven learning.
Tutorial Papers (This Repository)¶
Reinforcement Fields Framework β Enhanced exposition with modern formalization
Part I: Particle-Based Learning
- Functional fields over augmented state-action space
- Particle memory as belief state in RKHS
- MemoryUpdate and RF-SARSA algorithms
- Emergent soft state transitions, POMDP interpretation
Status: π Tutorial in progress (6/10 chapters complete)
Part II: Emergent Structure & Spectral Abstraction
- Functional clustering (clustering functions, not points)
- Spectral methods on kernel matrices
- Concepts as coherent subspaces of the reinforcement field
- Hierarchical policy organization
Status: π Planned (after Part I)
Planned Extensions¶
| Paper | Title | Status | Progress |
|---|---|---|---|
| Paper A | Generalized Reinforcement Learning β Actions as Operators | π’ Draft Complete | ~70% |
| Operator algebra, generalized Bellman equation, energy regularization | Complete draft, 3/7 figures, proofs outlined | ||
| Paper B | Operator Policies β Learning State-Space Operators with Neural Operator Networks (tentative) | β³ Planned | ~0% |
| Neural operators, scalable training, operator-actor-critic | After Paper A | ||
| Paper C | Applications of GRL to Physics, Robotics, and Differentiable Control (tentative) | β³ Planned | ~0% |
| Physics-based control, compositional behaviors, transfer learning | After Paper B |
Timeline:
- Paper A: Target submission April 2026 (NeurIPS/ICML)
- Paper B: Target submission June 2026 (ICML/NeurIPS)
- Paper C: Target submission July 2026 (CoRL)
See: Research Roadmap for detailed timeline and additional research directions.
π How GRL Works: Particle-Based Learning¶
flowchart LR
A["π <b>State</b><br/>s"] --> B["πΎ <b>Query</b><br/>Memory Ξ©"]
B --> C["π <b>Compute</b><br/>Field QβΊ"]
C --> D["π― <b>Infer</b><br/>Action ΞΈ"]
D --> E["β‘ <b>Execute</b><br/>Operator"]
E --> F["ποΈ <b>Observe</b><br/>s', r"]
F --> G["β¨ <b>Create</b><br/>Particle"]
G --> H["π <b>Update</b><br/>Memory"]
H -->|Loop| B
style A fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style B fill:#fff9c4,stroke:#f57c00,stroke-width:3px,color:#000
style C fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#000
style D fill:#fff59d,stroke:#fbc02d,stroke-width:3px,color:#000
style E fill:#ffcc80,stroke:#f57c00,stroke-width:3px,color:#000
style F fill:#c8e6c9,stroke:#388e3c,stroke-width:3px,color:#000
style G fill:#f8bbd0,stroke:#c2185b,stroke-width:3px,color:#000
style H fill:#b2dfdb,stroke:#00796b,stroke-width:3px,color:#000
Code Example¶
from grl.core import ParticleMemory
from grl.core import RBFKernel
from grl.algorithms import MemoryUpdate, RFSarsa
# Create particle memory (the agent's belief state)
memory = ParticleMemory()
# Define similarity kernel
kernel = RBFKernel(lengthscale=1.0)
# Learning loop
for episode in range(num_episodes):
state = env.reset()
for step in range(max_steps):
# Infer action from particle memory
action = infer_action(memory, state, kernel)
# Execute and observe
next_state, reward, done = env.step(action)
# Update particle memory (belief transition)
memory = memory_update(memory, state, action, reward, kernel)
state = next_state
π Citation¶
Original arXiv Paper¶
The foundational work is available on arXiv:
Chiu, P.-H., & Huber, M. (2022). Generalized Reinforcement Learning: Experience Particles, Action Operator, Reinforcement Field, Memory Association, and Decision Concepts. arXiv:2208.04822.
@article{chiu2022generalized,
title={Generalized Reinforcement Learning: Experience Particles, Action Operator,
Reinforcement Field, Memory Association, and Decision Concepts},
author={Chiu, Po-Hsiang and Huber, Manfred},
journal={arXiv preprint arXiv:2208.04822},
year={2022},
url={https://arxiv.org/abs/2208.04822}
}
Tutorial Papers (This Repository)¶
The tutorial series provides enhanced exposition and modern formalization:
Part I: Particle-Based Learning (In progress)
@article{chiu2026part1,
title={Reinforcement Fields: Particle-Based Learning},
author={Chiu, Po-Hsiang and Huber, Manfred},
journal={In preparation},
year={2026}
}
Part II: Emergent Structure & Spectral Abstraction (Planned)
@article{chiu2026part2,
title={Reinforcement Fields: Emergent Structure and Spectral Abstraction},
author={Chiu, Po-Hsiang and Huber, Manfred},
journal={In preparation},
year={2026}
}
Operator Extensions (Future Work)¶
@article{chiu2026operators,
title={Generalized Reinforcement Learning β Actions as Operators},
author={Chiu, Po-Hsiang},
journal={In preparation},
year={2026+}
}
π License¶
This project is licensed under the MIT License - see the LICENSE file for details.
π The GRL Framework¶
GRL (Generalized Reinforcement Learning) is a family of methods that rethink how actions are represented and learned.
Original paper: arXiv:2208.04822 (Chiu & Huber, 2022)
Reinforcement Fields (This Repository)¶
Two-Part Tutorial Series:
Part I: Particle-Based Learning
- Actions as continuous parameters in augmented state-action space
- Particle memory as belief state, kernel-induced value functions
- Learning through energy landscape navigation
Part II: Emergent Structure & Spectral Abstraction
- Concepts emerge from functional clustering in RKHS
- Spectral methods discover hierarchical structure
- Multi-level policy organization
Key Innovation: Learning emerges from particle dynamics in function space, not explicit policy optimization.
Actions as Operators (Paper A β In Development)¶
Core Idea: Actions as parametric operators that transform state space, with operator algebra providing compositional structure.
Key Innovation: Operator manifolds replace fixed action spaces, enabling compositional behaviors and physical interpretability.
π Acknowledgments¶
Mathematical Foundations¶
Core Framework:
- Formulated in Reproducing Kernel Hilbert Spaces (RKHS) β the functional framework for particle-based belief states
- Kernel methods define the geometry and similarity structure of augmented state-action space
- Inspired by the least-action principle in classical mechanics
Quantum-Inspired Probability:
- Probability amplitudes instead of direct probabilities β RKHS inner products as amplitude overlaps
- Complex-valued RKHS enabling interference effects and phase semantics for temporal/contextual dynamics
- Wave function analogy β The reinforcement field as a superposition of particle basis states
- This formulation is novel to mainstream ML and opens new directions for probabilistic reasoning
See: Quantum-Inspired Extensions for technical details.
Conceptual Connections¶
- Energy-based models (EBMs) β Control as energy landscape navigation
- POMDPs and belief-based control β Particle ensembles as implicit belief states
- Score-based methods β Energy gradients guide policy inference
Implementation Tools¶
- Gaussian process regression can model scalar energy fields (but is not essential to the framework)
- Neural operators for learning parametric action transformations
- Diffusion models share the gradient-field perspective