Foundation Model Adaptation: Implementation Guide¶
Quick reference for implementing the foundation model adaptation framework in your projects.
🎯 Quick Start¶
1. Check Your Hardware¶
# Activate environment
mamba activate genailab
# Check detected hardware
python -m genailab.foundation.configs.resource_profiles
Output example (M1 Mac):
Detected Profile: M1 MacBook Pro 16GB
Device: mps
Memory: 16.0 GB
Recommended model size: small
Max batch size: 8
2. Compare Model Configurations¶
Output:
Model Configuration Comparison
================================================================================
Config Params (M) Memory (GB) Batch Depth
--------------------------------------------------------------------------------
Small (M1 16GB) 50.2 8.1 32 6
Medium (RunPod 24GB) 201.3 18.4 32 12
Large (Cloud 40GB+) 603.9 35.2 64 24
3. Test LoRA Implementation¶
📦 Package Structure Created¶
src/genailab/foundation/
├── __init__.py ✅ Created
├── README.md ✅ Created
├── configs/
│ ├── __init__.py ✅ Created
│ ├── model_configs.py ✅ Created (SMALL/MEDIUM/LARGE)
│ └── resource_profiles.py ✅ Created (M1/RunPod/Cloud)
└── tuning/
├── __init__.py ✅ Created
└── lora.py ✅ Created (Full implementation)
Still to create:
tuning/adapters.pytuning/freeze.pyconditioning/film.pyconditioning/cross_attention.pyconditioning/cfg.pybackbones/dit.pyrecipes/latent_diffusion.py
🔧 Usage Patterns¶
Pattern 1: Auto-Configure for Your Hardware¶
from genailab.foundation.configs import get_resource_profile, get_model_config
# Auto-detect
profile = get_resource_profile()
config = get_model_config(profile.recommended_model_size)
print(f"Using {config.embed_dim}d model with {config.depth} layers")
print(f"Batch size: {config.batch_size} (×{config.gradient_accumulation_steps} accum)")
print(f"Memory estimate: ~{config.memory_estimate_gb()}GB")
Pattern 2: Apply LoRA to Any Model¶
from genailab.foundation.tuning import apply_lora_to_model
# Your model
model = YourTransformer(embed_dim=256, depth=6)
# Apply LoRA (trains only ~1% of parameters!)
model = apply_lora_to_model(
model,
target_modules=["attention.query", "attention.key", "attention.value"],
rank=8,
alpha=16,
)
# Train only LoRA parameters
optimizer = torch.optim.AdamW(
[p for p in model.parameters() if p.requires_grad],
lr=1e-4,
)
Pattern 3: Resource-Aware Training Loop¶
import torch
from genailab.foundation.configs import get_resource_profile, get_model_config
# Configure
profile = get_resource_profile()
config = get_model_config(profile.recommended_model_size)
# Setup
device = torch.device(profile.device)
model = model.to(device)
# Training with gradient accumulation
accumulation_steps = config.gradient_accumulation_steps
for step, batch in enumerate(dataloader):
x = batch['data'].to(device)
# Forward
loss = model(x) / accumulation_steps
# Backward
loss.backward()
# Update every N steps
if (step + 1) % accumulation_steps == 0:
optimizer.step()
optimizer.zero_grad()
💾 Saving and Loading¶
Save Only LoRA Weights (Tiny File!)¶
from genailab.foundation.tuning import LoRA
# After training
LoRA.save_lora_only(model, "lora_weights.pt")
# File size: ~1MB (vs ~200MB for full model)
# Load later
base_model = YourTransformer()
base_model = apply_lora_to_model(base_model, rank=8)
LoRA.load_lora_only(base_model, "lora_weights.pt")
Merge for Inference¶
🎓 Learning Path¶
For M1 Mac Users (16GB)¶
Start here:
1. Run python -m genailab.foundation.configs.resource_profiles
2. Verify you get SMALL_CONFIG recommendation
3. Open notebooks/foundation_models/01_model_sizes_and_resources.ipynb
4. Try notebooks/foundation_models/02_lora_basics.ipynb
Key settings for M1:
- Batch size: 8
- Gradient accumulation: 4 (effective batch = 32)
- Gradient checkpointing: ON
- Mixed precision: ON (fp16)
- Device:
mps
For RunPod/Cloud Users (24GB+)¶
Start here:
1. Verify CUDA setup: python -c "import torch; print(torch.cuda.is_available())"
2. Run resource detection
3. Jump to notebooks/foundation_models/03_adapters_vs_lora.ipynb
4. Experiment with MEDIUM_CONFIG or LARGE_CONFIG
Key settings for GPU:
- Batch size: 32-64
- Gradient accumulation: 1
- Flash attention: ON
- Torch compile: ON
🔬 Next Steps¶
Immediate (This Session)¶
-
Test the framework:
-
Create first notebook:
01_model_sizes_and_resources.ipynb - Interactive hardware detection
- Model size comparison
-
Memory estimation examples
-
Implement remaining tuning modules:
adapters.py— Bottleneck adapter implementationfreeze.py— Layer freezing utilities
Short-term (Next Sessions)¶
- Conditioning mechanisms:
film.py— FiLM layers for perturbation conditioningcross_attention.py— Multi-modal conditioning-
cfg.py— Classifier-free guidance -
Complete notebooks:
02_lora_basics.ipynb03_adapters_vs_lora.ipynb-
07_end_to_end_gene_expression.ipynb -
Recipes:
latent_diffusion.py— Complete pipelineperturbation.py— Perturbation prediction
Long-term¶
- Advanced patterns:
- Mixture of Experts (MoE)
- Progressive unfreezing
-
Multi-task learning
-
Production deployment:
- Model serving
- Batch inference
- API endpoints
📊 Model Size Reference¶
Small Config (M1 Mac 16GB)¶
from genailab.foundation.configs import SMALL_CONFIG
print(SMALL_CONFIG.embed_dim) # 256
print(SMALL_CONFIG.depth) # 6
print(SMALL_CONFIG.num_heads) # 8
print(SMALL_CONFIG.batch_size) # 8
print(SMALL_CONFIG.gradient_accumulation_steps) # 4
print(SMALL_CONFIG.use_checkpoint) # True
Use for: Development, prototyping, testing on M1 Mac
Medium Config (RunPod 24GB)¶
from genailab.foundation.configs import MEDIUM_CONFIG
print(MEDIUM_CONFIG.embed_dim) # 512
print(MEDIUM_CONFIG.depth) # 12
print(MEDIUM_CONFIG.num_heads) # 8
print(MEDIUM_CONFIG.batch_size) # 32
print(MEDIUM_CONFIG.use_flash_attention) # True
Use for: Training, experimentation, RunPod instances
Large Config (Cloud 40GB+)¶
from genailab.foundation.configs import LARGE_CONFIG
print(LARGE_CONFIG.embed_dim) # 768
print(LARGE_CONFIG.depth) # 24
print(LARGE_CONFIG.num_heads) # 12
print(LARGE_CONFIG.num_tokens) # 128
print(LARGE_CONFIG.batch_size) # 64
Use for: Production, large-scale training, cloud instances
🐛 Troubleshooting¶
"Out of Memory" on M1 Mac¶
Solution 1: Reduce batch size
config = SMALL_CONFIG
config.batch_size = 4 # Reduce from 8
config.gradient_accumulation_steps = 8 # Increase to maintain effective batch
Solution 2: Enable gradient checkpointing
Solution 3: Reduce model size
LoRA Not Reducing Parameters¶
Check: Verify LoRA was applied correctly
from genailab.foundation.tuning import LoRA
LoRA.print_trainable_parameters(model)
# Should show ~1-2% trainable
Fix: Ensure target modules match your model
# Print all module names
for name, _ in model.named_modules():
print(name)
# Apply to correct modules
model = apply_lora_to_model(
model,
target_modules=["your.actual.module.names"],
rank=8,
)
MPS (M1) Performance Issues¶
Tip 1: Use mixed precision
from torch.cuda.amp import autocast
with autocast(device_type='cpu', dtype=torch.float16):
output = model(input)
Tip 2: Avoid frequent CPU-GPU transfers
# Bad: Transfer every step
for x in data:
x = x.to('mps')
# Good: Transfer batch once
batch = batch.to('mps')
for x in batch:
...
📚 Related Documentation¶
- Foundation Models Overview
- Transformer Data Shapes
- Latent Diffusion Series
- Package README:
src/genailab/foundation/README.md - Notebook Tutorials:
notebooks/foundation_models/README.md
✅ Verification Checklist¶
Before moving to notebooks:
- Resource detection works:
python -m genailab.foundation.configs.resource_profiles - Model configs print correctly:
python -m genailab.foundation.configs.model_configs - LoRA test runs:
python -m genailab.foundation.tuning.lora - Can import in Python:
from genailab.foundation import * - Memory estimates are reasonable for your hardware
🎯 Success Criteria¶
You'll know the framework is working when:
- Auto-detection works: Correctly identifies your hardware
- LoRA reduces parameters: From 100% to ~1-2% trainable
- Memory fits: Model + optimizer + activations < available memory
- Training runs: Can complete one epoch without OOM
- Saves/loads work: LoRA weights save and restore correctly
Next Session Preview¶
In the next session, we'll create:
- First notebook: Interactive hardware detection and model sizing
- Adapter implementation: Alternative to LoRA
- Freeze utilities: Layer freezing strategies
- Comparison notebook: LoRA vs Adapters vs Full fine-tuning
This will give you a complete toolkit for parameter-efficient fine-tuning!