JEPA Documentation¶
Joint Embedding Predictive Architecture (JEPA) β A self-supervised learning paradigm that learns by predicting in embedding space rather than reconstructing in data space.
This documentation series covers JEPA from first principles through computational biology applications, with a focus on perturbation prediction and trajectory modeling.
Core Documentation Series¶
1. Overview¶
00_jepa_overview.md β What is JEPA and why it matters - Core concepts: predict embeddings, not pixels - JEPA vs generative models vs contrastive learning - Joint latent spaces (Goku insight) - Why JEPA for biology - When to use JEPA vs generative models
2. Foundations¶
01_jepa_foundations.md β Architecture and components - Encoder architecture - Predictor design - VICReg regularization (variance, invariance, covariance) - Masking strategies - Complete PyTorch implementation
3. Training¶
02_jepa_training.md β Training strategies and best practices - Training loop - Loss computation - Hyperparameters - Optimization strategies - Debugging and monitoring - Advanced techniques
4. Applications¶
03_jepa_applications.md β From vision to biology - I-JEPA (image masking) - V-JEPA (video prediction) - Bio-JEPA (perturbation prediction) - Multi-omics integration - Trajectory inference
5. Perturb-seq Application¶
04_jepa_perturbseq.md β Detailed Perturb-seq implementation - Dataset preparation - Perturbation conditioning - Model architecture - Training pipeline - Evaluation metrics - Comparison with scGen/CPA
6. Joint Latent Spaces¶
05_joint_latent_spaces.md β Unified approach for static and dynamic data - Goku model insights: joint image-video generation - Joint latent spaces for bulk + time-series biology - Patch n' Pack for variable-length sequences - JEPA architecture for Perturb-seq - When (not) to use joint latent spaces
Supplementary Documents¶
Open Research¶
open_research_joint_latent.md β Joint latent spaces (legacy)
Note: Content graduated to 05_joint_latent_spaces.md
Quick Navigation¶
For Different Audiences¶
New to JEPA? 1. Start with Overview 2. Read Foundations for architecture 3. Try toy examples from Training
Coming from Generative Models? 1. Read Overview comparison section 2. Understand why prediction β generation 3. Learn when to combine JEPA + diffusion
Interested in Biology Applications? 1. Read Overview biology section 2. Jump to Applications 3. Deep dive into Perturb-seq
Ready to Implement? 1. Review Foundations architecture 2. Follow Training pipeline 3. Adapt Perturb-seq code
Key Concepts¶
What Makes JEPA Different¶
Traditional Generative Models (VAE, Diffusion):
JEPA:
Context β Encoder β z_context
β
Predictor β αΊ_target
β
Target β Encoder β z_target
Loss: ||z_target - αΊ_target||Β² (embedding-level)
Key advantages:
- No decoder (10-100Γ faster)
- Semantic prediction (robust to noise)
- No contrastive negatives (simpler than SimCLR)
- Compositional reasoning (combine perturbations)
Core Components¶
1. Encoder: Maps inputs to embeddings - Shared across all inputs - Vision Transformer (ViT) for images - MLP/Transformer for gene expression
2. Predictor: Predicts target embedding from context - Transformer-based - Conditioned on context (time, perturbation, etc.) - Learns relationships in embedding space
3. VICReg Loss: Prevents collapse - Variance: Keep embeddings spread out - Invariance: Predictions match targets - Covariance: Decorrelate dimensions
Joint Latent Spaces¶
Insight from Goku (ByteDance, 2024):
If two data types differ only by dimensionality or observation density, they want the same latent space.
For biology:
- Bulk RNA-seq (static) + Time-series (dynamic) β Same latent space
- Static data teaches spatial priors (cell types, pathways)
- Dynamic data teaches temporal dynamics
- Both inform the same representation
JEPA Variants¶
I-JEPA (Image)¶
Task: Predict masked image regions in embedding space
Key innovation: Masking in embedding space, not pixel space
Papers: Assran et al. (2023)
V-JEPA (Video)¶
Task: Predict future video frames in embedding space
Key innovation: Temporal prediction without generation
Papers: Bardes et al. (2024), Meta AI (2025)
Bio-JEPA (Proposed)¶
Task: Predict perturbed/future cell states in embedding space
Key innovation: Perturbation operators in latent space
Applications:
- Perturb-seq prediction
- Trajectory inference
- Multi-omics translation
- Drug response prediction
Biology Applications¶
1. Perturbation Prediction (Perturb-seq)¶
Problem: Predict cellular response to genetic/chemical perturbations
JEPA approach:
z_baseline = encoder(x_baseline)
z_pert = perturbation_encoder(perturbation_info)
z_pred = predictor(z_baseline, z_pert)
loss = ||z_pred - encoder(x_perturbed)||Β²
Advantages:
- No need to reconstruct all 20K genes
- Learn perturbation operators
- Compositional (combine perturbations)
- Efficient (no decoder)
Datasets: Norman et al. (2019), Replogle et al. (2022)
2. Trajectory Inference¶
Problem: Predict developmental or disease trajectories
JEPA approach:
z_t = encoder(x_t)
z_t1_pred = predictor(z_t, time_embedding)
loss = ||z_t1_pred - encoder(x_t1)||Β²
Applications:
- Developmental biology
- Disease progression
- Drug response over time
3. Multi-omics Integration¶
Problem: Predict one modality from another
JEPA approach:
z_rna = encoder_rna(x_rna)
z_protein_pred = predictor(z_rna)
loss = ||z_protein_pred - encoder_protein(x_protein)||Β²
Applications:
- RNA β Protein prediction
- ATAC β RNA prediction
- Cross-species translation
4. Drug Response Prediction¶
Problem: Predict cellular response to drugs
JEPA approach:
z_baseline = encoder(x_baseline)
z_drug = drug_encoder(drug_features)
z_response = predictor(z_baseline, z_drug)
loss = ||z_response - encoder(x_treated)||Β²
Applications:
- Drug screening
- Combination therapy
- Patient stratification
Comparison with Other Methods¶
JEPA vs VAE¶
| Aspect | VAE | JEPA |
|---|---|---|
| Objective | Reconstruct input | Predict target embedding |
| Loss | Pixel-level + KL | Embedding-level + VICReg |
| Decoder | Required | Not needed |
| Speed | Slow | Fast (10-100Γ) |
| Generation | Yes | No (need wrapper) |
| Robustness | Moderate | High |
JEPA vs Diffusion¶
| Aspect | Diffusion | JEPA |
|---|---|---|
| Objective | Denoise/predict velocity | Predict embedding |
| Loss | Pixel-level | Embedding-level |
| Sampling | ODE/SDE (slow) | Direct (fast) |
| Generation | Yes | No (need wrapper) |
| Uncertainty | Via sampling | Need wrapper |
| Efficiency | Moderate | High |
JEPA vs Contrastive (SimCLR)¶
| Aspect | SimCLR | JEPA |
|---|---|---|
| Objective | Maximize agreement | Predict embedding |
| Negatives | Required | Not needed |
| Loss | Contrastive | MSE + VICReg |
| Complexity | High (negative sampling) | Low |
| Prediction | No | Yes |
JEPA vs scGen/CPA (Perturbation Models)¶
| Aspect | scGen/CPA | JEPA |
|---|---|---|
| Architecture | VAE + arithmetic | Encoder + Predictor |
| Perturbation | Latent arithmetic | Learned operators |
| Reconstruction | Required | Not needed |
| Compositional | Limited | Natural |
| Efficiency | Moderate | High |
When to Use JEPA¶
β Use JEPA When:¶
Prediction is the goal (not generation) - Perturbation prediction - Trajectory inference - Multi-omics translation
Efficiency matters
- Large-scale datasets
- Limited compute
- Need fast training
Robustness is critical
- Noisy data
- Batch effects
- Technical variation
Compositional reasoning needed
- Combine perturbations
- Transfer across contexts
- Causal modeling
β Use Generative Models When:¶
Need actual samples
- Data augmentation
- Synthetic data generation
- Uncertainty quantification
Reconstruction quality matters
- Image generation
- High-fidelity synthesis
Distribution modeling is the goal
- Density estimation
- Anomaly detection
π Best: Hybrid JEPA + Generative¶
Combine both: 1. JEPA learns dynamics efficiently 2. Generative model handles sampling 3. Get prediction + generation + uncertainty
Example: JEPA + Diffusion
# JEPA predicts perturbed embedding
z_pred = jepa_predictor(z_baseline, perturbation)
# Diffusion generates samples from embedding
x_samples = diffusion_decoder(z_pred, num_samples=100)
# Get both prediction and uncertainty
Implementation Roadmap¶
Phase 1: Basic JEPA¶
- Encoder architecture (ViT or MLP)
- Predictor network (Transformer)
- VICReg loss implementation
- Training loop
- Toy examples (MNIST, synthetic)
Phase 2: Bio-JEPA¶
- Gene expression encoder
- Perturbation conditioning
- Perturb-seq dataset loader
- Training on Norman et al. data
- Evaluation metrics
Phase 3: Joint Latent Spaces¶
- Joint encoder for bulk + single-cell
- Static + dynamic data training
- Multi-omics integration
- Cross-dataset transfer
Phase 4: JEPA + Generative¶
- Diffusion decoder
- Uncertainty quantification
- Sample generation
- Full predictive-generative system
Learning Path¶
Beginner Path¶
- Understand the concept β Overview
- Learn the architecture β Foundations
- Train on toy data β Training
- Explore applications β Applications
Intermediate Path¶
- Review architecture β Foundations
- Implement training β Training
- Apply to Perturb-seq β Perturb-seq
- Compare with baselines β Evaluate against scGen/CPA
Advanced Path¶
- Joint latent spaces β Open Research
- Hybrid JEPA + Diffusion β Combine prediction and generation
- Multi-omics integration β Cross-modality prediction
- Novel applications β Extend to new biology problems
Related Documentation¶
Within This Project¶
Generative Models:
- DDPM β Denoising diffusion
- SDE β Stochastic differential equations
- Flow Matching β Rectified flow
- DiT β Diffusion transformers
- VAE β Variational autoencoders
Architecture Choices:
- Gene Expression Architectures β Tokenization for biology
Incubation:
- Joint Latent Spaces β Goku insights
External Resources¶
JEPA Papers:
- LeCun (2022): "A Path Towards Autonomous Machine Intelligence"
- Assran et al. (2023): "Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture" (I-JEPA)
- Bardes et al. (2024): "V-JEPA: Latent Video Prediction"
- Meta AI (2025): "V-JEPA 2: Understanding, Prediction, and Planning"
Joint Latent Spaces:
- ByteDance & HKU (2024): "Goku: Native Joint Image-Video Generation"
VICReg:
- Bardes et al. (2022): "VICReg: Variance-Invariance-Covariance Regularization"
Biology Applications:
- Norman et al. (2019): "Exploring genetic interaction manifolds" (Perturb-seq)
- Lotfollahi et al. (2019): "scGen predicts single-cell perturbation responses"
- Roohani et al. (2023): "Predicting transcriptional outcomes of novel multigene perturbations" (GEARS)
Key Takeaways¶
Conceptual¶
- Predict embeddings, not pixels β More efficient, more robust
- No reconstruction needed β Focus on semantic content
- No contrastive negatives β Simpler than SimCLR/MoCo
- World models without generation β Learn dynamics efficiently
- Joint latent spaces β Static and dynamic data train each other
Practical¶
- JEPA is not generative β Predicts embeddings, not samples
- VICReg prevents collapse β Variance + covariance regularization
- Powerful predictor needed β Transformer-based works well
- Combine with generative β For sampling and uncertainty
- Perfect for biology β Perturbations, trajectories, multi-omics
For Computational Biology¶
- Perturb-seq is ideal β Predict perturbed states efficiently
- Efficiency matters β 20K genes, millions of cells
- Robustness critical β Technical noise, batch effects
- Compositional reasoning β Combine perturbations naturally
- Hybrid approach best β JEPA + diffusion for full system
Getting Started¶
Quick start:
# Read overview
cat docs/JEPA/00_jepa_overview.md
# Understand architecture
cat docs/JEPA/01_jepa_foundations.md
# See training examples
cat docs/JEPA/02_jepa_training.md
For biology applications:
# Jump to applications
cat docs/JEPA/03_jepa_applications.md
# Deep dive into Perturb-seq
cat docs/JEPA/04_jepa_perturbseq.md
For implementation:
# Check source code (when available)
ls src/genailab/jepa/
# Run notebooks (when available)
ls notebooks/jepa/
Status¶
Documentation: π§ In Progress - [x] Overview - [ ] Foundations - [ ] Training - [ ] Applications - [ ] Perturb-seq - [ ] Open Research
Implementation: π² Planned - [ ] Core JEPA modules - [ ] Training infrastructure - [ ] Perturb-seq application - [ ] Evaluation metrics
Notebooks: π² Planned - [ ] Toy examples - [ ] Gene expression JEPA - [ ] Perturb-seq prediction - [ ] Comparison with baselines