Skip to content

Field Series Roadmap

Building GRL Understanding Systematically

This document tracks the progression from foundational concepts to complete GRL algorithms.


โœ… Completed: Foundation (Notebooks 0-3)

Notebook 0: Introduction to Vector Fields

Status: Complete
Topics: Real-world examples, basic intuition
Time: ~10-15 minutes

Notebook 1: Classical Vector Fields

Status: Complete
Topics:

  • Vector field definition and visualization
  • Gradient fields (connection to optimization)
  • Rotational fields and curl
  • Superposition of fields
  • Trajectories following gradients

Time: ~20-25 minutes

Notebook 1a: Vector Fields and ODEs

Status: Complete
Topics:

  • ODEs as following vector fields (\(\dot{x} = F(x)\))
  • Numerical solvers (Euler, RK4)
  • Phase portraits and fixed points
  • Gradient flow (optimization as ODE)
  • Connection to flow matching (genai-lab)

Time: ~25-30 minutes

Notebook 2: Functional Fields

Status: Complete
Topics:

  • Functions as infinite-dimensional vectors
  • Kernel functions and similarity
  • RKHS intuition
  • Functional gradients
  • Superposition in function space

Time: ~20-25 minutes

Notebook 3: Reinforcement Fields

Status: Complete
Topics:

  • Augmented state-action space: \(z = (s, \theta)\)
  • Particle memory: \(\{(z_i, w_i)\}\)
  • Field emergence: \(Q^+(z) = \sum_i w_i k(z, z_i)\)
  • Basic policy inference: \(\theta^* = \arg\max_\theta Q^+(s, \theta)\) (discrete search)
  • Obstacles via negative particles

Time: ~30 minutes

Supplementary:

  • 03a_particle_coverage_effects.ipynb โ€” Visual proof of particle coverage effects
  • particle_vs_gradient_fields.md โ€” Theory comparison

๐Ÿšง In Progress / Planned: Learning Algorithms

Notebook 4: Policy Inference (Planned)

Goal: Deep dive into how agents extract policies from the Qโบ field

Topics to Cover:

  1. Greedy Policy (already introduced in Notebook 3)
  2. Discrete action search: \(\theta^* = \arg\max_\theta Q^+(s, \theta)\)
  3. Computational considerations (number of angles)
  4. Limitations of discrete search

  5. Gradient-Based Policy (new)

  6. Continuous optimization: \(\nabla_\theta Q^+(s, \theta) = 0\)
  7. Gradient ascent on action space
  8. Connection to policy gradient methods

  9. Boltzmann (Soft) Policy (new)

  10. Exploration via softmax: \(\pi(\theta|s) \propto \exp(\beta Q^+(s, \theta))\)
  11. Temperature parameter \(\beta\)
  12. Entropy regularization

  13. Action Landscapes (expand from Notebook 3)

  14. Visualizing \(Q^+(s, \cdot)\) for fixed states
  15. Multi-modal action distributions
  16. Local vs. global optima

Visualizations:

  • Polar plots of action landscapes at different temperatures
  • Comparison: greedy vs. Boltzmann sampling
  • Interactive sliders for temperature \(\beta\)

Time Estimate: ~25-30 minutes

Prerequisites: Notebook 3


Notebook 5: Memory Update โ€” Learning from Experience (Planned)

Goal: Understand how the field evolves as the agent learns

Topics to Cover:

  1. Single Particle Addition

  2. New experience: \((s, a, r)\)

  3. Creating particle: \((z_{new}, w_{new})\) where \(z_{new} = (s, a)\)
  4. Weight assignment: \(w_{new} = f(r, \gamma, ...)\)

  5. Field Evolution

  6. Before/after comparison

  7. Difference map: \(\Delta Q^+ = Q^+_{after} - Q^+_{before}\)
  8. "Ripple" effect from new particle
  9. Kernel lengthscale controls influence radius

  10. MemoryUpdate Algorithm

  11. Pseudocode walkthrough

  12. When to add positive vs. negative particles
  13. Memory management (capacity limits)

  14. Interactive Demonstration

  15. Click to add particles

  16. See field update in real-time
  17. Observe policy changes

Visualizations:

  • Side-by-side: Qโบ before/after adding particle
  • Heatmap of \(\Delta Q^+\)
  • Animated field evolution over multiple updates
  • Policy vector field changes

Code Examples:

def add_particle(particles, z_new, w_new):
    """Add a new particle to memory."""
    particles.append({'z': z_new, 'w': w_new})
    return particles

def compute_field_difference(X, Y, particles_before, particles_after):
    """Visualize how field changed."""
    Q_before = compute_Q_field(X, Y, particles_before)
    Q_after = compute_Q_field(X, Y, particles_after)
    return Q_after - Q_before

Time Estimate: ~30-35 minutes

Prerequisites: Notebooks 3, 4


Notebook 6: RF-SARSA โ€” Complete Learning Algorithm (Planned)

Goal: Implement and understand the full GRL learning algorithm

Topics to Cover:

  1. SARSA Recap

  2. Classical SARSA: \(Q(s,a) \leftarrow Q(s,a) + \alpha[r + \gamma Q(s',a') - Q(s,a)]\)

  3. TD error and bootstrapping

  4. RF-SARSA Adaptation

  5. No explicit Q-table โ€” field represents Q-function

  6. TD error in RKHS: \(\delta = r + \gamma Q^+(s', a') - Q^+(s, a)\)
  7. Particle weight from TD error: \(w_{new} = \alpha \delta\)

  8. Algorithm Walkthrough

  9. Initialize: empty particle memory

  10. Episode loop:

    • Select action via policy (Boltzmann or greedy)
    • Execute, observe \((s', r)\)
    • Compute TD error
    • Add particle if \(|\delta| > \epsilon\)
    • Field emerges from accumulated particles
  11. Convergence and Stability

  12. When does the field stabilize?

  13. Memory growth over time
  14. Particle pruning strategies

Visualizations:

  • Episode-by-episode field evolution (animated)
  • TD error over time
  • Number of particles vs. episodes
  • Final learned policy vs. optimal policy

Code Examples:

def rf_sarsa_episode(env, particles, alpha, gamma, beta):
    """Run one episode of RF-SARSA."""
    s = env.reset()
    a = sample_boltzmann_policy(s, particles, beta)

    for t in range(max_steps):
        s_next, r, done = env.step(a)
        a_next = sample_boltzmann_policy(s_next, particles, beta)

        # TD error
        Q_sa = compute_Q_plus(s, a, particles)
        Q_next = compute_Q_plus(s_next, a_next, particles)
        delta = r + gamma * Q_next - Q_sa

        # Add particle if significant
        if abs(delta) > epsilon:
            z_new = (s, a)
            w_new = alpha * delta
            particles.append({'z': z_new, 'w': w_new})

        s, a = s_next, a_next
        if done: break

    return particles

Experiments:

  • 2D navigation (from Notebook 3)
  • Gridworld
  • Mountain car (continuous actions)

Time Estimate: ~40-45 minutes

Prerequisites: Notebooks 3, 4, 5


๐Ÿ”ฎ Future Topics (Beyond Core Series)

Advanced Topics (Potential Notebooks 7+)

  1. Kernel Design and Selection

  2. RBF vs. other kernels

  3. Adaptive lengthscales
  4. State-action factorization

  5. Scalability and Approximations

  6. Particle pruning

  7. Sparse approximations
  8. Nystrรถm methods

  9. Multi-Task and Transfer Learning

  10. Shared particle memories

  11. Task-specific fields
  12. Meta-learning

  13. Theoretical Foundations

  14. Convergence proofs

  15. Sample complexity
  16. Relationship to kernel-based RL

  17. Comparison with Other Methods

  18. GRL vs. DQN

  19. GRL vs. SAC
  20. GRL vs. PPO
  21. When to use GRL?

Development Principles

Systematic Progression:

  1. โœ… Build intuition (Notebooks 0-3)
  2. ๐Ÿšง Understand components (Notebooks 4-5)
  3. ๐Ÿ”ฎ Implement algorithms (Notebook 6)
  4. ๐Ÿ”ฎ Explore advanced topics (Notebooks 7+)

Each Notebook Should:

  • Build on previous concepts
  • Include professional visualizations
  • Provide working code examples
  • Connect theory to practice
  • Take 20-45 minutes to complete

Pedagogical Goals:

  • Visual > Mathematical (when possible)
  • Interactive > Static (when useful)
  • Synthetic > Real (for clarity, then real for validation)
  • Incremental > Comprehensive (build up systematically)

Timeline and Priorities

High Priority (Core Understanding)

  • Notebook 4: Policy Inference
  • Notebook 5: Memory Update
  • Notebook 6: RF-SARSA

Medium Priority (Practical Application)

  • Integration with real RL environments
  • Performance benchmarks
  • Hyperparameter tuning guide

Low Priority (Advanced Topics)

  • Theoretical deep dives
  • Comparison studies
  • Extensions and variants

Within GRL Project:

  • Tutorial series: docs/GRL0/tutorials/
  • Theory documents: docs/theory/
  • Implementation: src/ (when available)

External Projects:


Last Updated: January 15, 2026

Status: Foundation complete (Notebooks 0-3), planning next phase (Notebooks 4-6)