Remote Training Options: Modal, SkyPilot, and RunPods¶
Status: Reference Audience: Developers needing GPU compute beyond your local computer Last Updated: March 2026
Problem¶
Training genomic foundation models (Evo2 7B/40B) and their downstream classifiers often requires dedicated GPU compute beyond what's available on a typical development machine. The manual RunPods workflow works but involves SSH configuration, pod environment setup, GitHub SSH keys, and rsync-based data/code transfer for every new pod instance.
Goal: Minimize the friction between "code works locally on small data" and "train on a real GPU" while keeping costs predictable.
Development Principle: Local-First¶
Always develop and validate the full training workflow locally before offloading to remote GPU:
- Build locally (small/synthetic data, MPS/CPU) until the pipeline runs end-to-end
- Test locally with a real data subset to confirm correctness
- Only then dispatch to remote GPU for production-scale training
Remote compute should be used only when the training pipeline is proven to work. This minimizes wasted GPU hours spent debugging serialization issues, data loading bugs, or training loop errors over SSH.
Option 1: Modal (Recommended for Programmatic Remote Training)¶
Website: modal.com
How It Works¶
Modal lets you decorate a Python function with a GPU requirement and call it from your local machine. The function runs in a container on Modal's infrastructure with the specified GPU. No SSH, no pod setup, no environment scripts.
import modal
app = modal.App("evo2-training")
image = modal.Image.debian_slim(python_version="3.12").pip_install(
"torch", "transformers", "einops", "h5py", "peft", "scikit-learn"
)
@app.function(gpu="A100", image=image, timeout=3600)
def train_classifier(config: dict, embeddings_path: str) -> dict:
"""Runs on A100 in Modal's cloud."""
import torch
# ... training code identical to local version ...
return model.state_dict()
# Called locally — weights returned automatically
weights = train_classifier.remote(config, "/data/embeddings.h5")
torch.save(weights, "output/classifier.pt")
Setup (One-Time)¶
Data Transfer¶
- Small data (< 100 MB): Pass as function arguments (serialized automatically)
- Large data (genomic sequences, HDF5 embeddings): Use Modal Volumes — persistent cloud storage mounted into containers
volume = modal.Volume.from_name("genomic-data", create_if_missing=True)
@app.function(gpu="A100", volumes={"/data": volume})
def train_with_volume(config):
# /data/ contains your uploaded HDF5 files, parquet, etc.
embeddings = load_embeddings("/data/embeddings.h5")
...
Upload data once:
Pricing (as of March 2026)¶
Billed per-second of compute. No idle costs.
| GPU | Approx. $/hr | VRAM | Use Case |
|---|---|---|---|
| T4 | ~$0.60 | 16 GB | Quick tests, small classifiers |
| A10G | ~$1.10 | 24 GB | Medium models, hyperparameter search |
| A100 40GB | ~$2.10 | 40 GB | Evo2 7B embeddings, classifier training |
| A100 80GB | ~$2.50 | 80 GB | Evo2 40B (INT8), large-batch training |
| H100 | ~$3.95 | 80 GB | Fastest inference/training |
Pricing changes frequently. See modal.com/pricing for current rates.
Strengths¶
- Near-zero setup friction (pip install + browser auth)
- Same Python code locally and remotely (only the decorator/dispatch differs)
- Per-second billing, scales to zero (no idle pod costs)
- Container images are cached — cold start ~30-60s first time, warm after
- Volumes for persistent data (model weights, embeddings, training artifacts)
- Native Python SDK (no YAML, no shell scripts)
- Supports custom conda environments via
Image.from_conda
Limitations¶
- Cold-start latency (~30-60s if container hasn't been used recently)
- Slightly higher $/GPU-hour vs raw RunPod on-demand pods
- Data must be uploaded to Modal Volumes (can't mount local filesystem)
- Debugging remote failures requires reading Modal logs (no SSH into container)
- Modal is a managed service — less control than a raw VM/pod
Option 2: SkyPilot (Automated Multi-Cloud Orchestration)¶
Website: skypilot.co | GitHub
How It Works¶
SkyPilot is an open-source orchestrator that provisions GPU VMs/pods across cloud providers (RunPod, Lambda, AWS, GCP, Azure, and more). It automates the manual steps: finding cheapest GPU, provisioning, environment setup, running your script, and teardown.
# train.yaml
resources:
accelerators: A100:1
cloud: runpod # or: lambda, aws, gcp, any
setup: |
pip install -e .
pip install -e ./foundation_models
run: |
python foundation_models/examples/03_train_classifier.py \
--config configs/evo2_7b_classifier.yaml
sky launch train.yaml # Provisions, sets up, runs
sky down train # Tears down when done
sky launch train.yaml --use-spot # Use spot/preemptible for ~3x savings
Setup (One-Time Per Provider)¶
pip install "skypilot[runpod]" # or: skypilot[aws], skypilot[all]
sky check # Verifies cloud credentials
For RunPod: set RUNPOD_API_KEY environment variable.
Data Transfer¶
SkyPilot uses rsync under the hood — your local data/ directory can be synced to the VM automatically via file_mounts in the YAML:
Or use cloud storage (S3, GCS) for larger datasets.
Pricing¶
SkyPilot itself is free (open-source). You pay the underlying cloud provider's rates:
| Provider | A40 48GB ($/hr) | A100 80GB ($/hr) | Spot Discount |
|---|---|---|---|
| RunPod | ~$0.69 | ~$1.64 | N/A (on-demand only) |
| Lambda | ~$0.80 | ~$1.29 | N/A |
| AWS (p4d) | — | ~$3.10 | ~60-70% off |
| GCP (a2) | — | ~$3.67 | ~60-70% off |
SkyPilot can auto-select the cheapest available provider (cloud: any).
Strengths¶
- Uses your existing RunPod account (and any other cloud accounts)
- Automatic price-shopping across providers
- Spot/preemptible instance support for significant cost savings
- SSH access to the VM (full debugging capability)
- File mounts sync local directories automatically
- Open-source, no vendor lock-in
- YAML-based — good for reproducible experiment configs
- Official RunPod integration
Limitations¶
- Still VM/pod-based (boot + setup time: 2-5 minutes per launch)
- Requires cloud credentials configured per provider
- More infrastructure to manage than Modal (YAML configs, credential setup)
- No "call a Python function remotely" pattern (batch job model instead)
- Each launch is a full VM lifecycle (provision → setup → run → teardown)
- Less elegant for rapid iteration vs Modal's function-call pattern
Option 3: RunPods (Current Approach — Manual Pods)¶
Website: runpod.io
How It Works¶
Manually provision a GPU pod via the RunPod web UI or CLI, SSH in, configure the environment, rsync code and data, run training, rsync results back.
Setup References¶
- First-time setup: docs/getting_started/RUNPODS_SETUP.md
- SSH manager scripts:
runpods.example/scripts/ - Pod environment setup:
runpods.example/docs/POD_ENVIRONMENT_SETUP.md - Data transfer:
runpods.example/docs/RSYNC_QUICK_REFERENCE.md - GitHub SSH on pod:
runpods.example/docs/GITHUB_SSH_SETUP.md
Pricing¶
Lowest raw $/GPU-hour of the three options:
| GPU | On-Demand ($/hr) | Community Cloud ($/hr) |
|---|---|---|
| A40 48GB | ~$0.69 | ~$0.39 |
| A100 80GB | ~$1.64 | ~$1.04 |
| H100 80GB | ~$3.29 | ~$2.49 |
See runpod.io/pricing for current rates.
Strengths¶
- Cheapest per-GPU-hour
- Full SSH access and control
- Persistent pods (keep running between sessions)
- Community cloud pricing for additional savings
- Already set up in this project (scripts, docs, environment files)
Limitations¶
- Manual SSH config per pod instance
- GitHub SSH keys must be configured on each new pod
- Environment setup required per pod (conda create, pip install)
- Rsync for code/data transfer each session
- Idle pods cost money (must remember to stop/delete)
- No programmatic dispatch from local code
Comparison Summary¶
| Feature | Modal | SkyPilot | RunPods (Manual) |
|---|---|---|---|
| Setup friction | Minimal (pip + auth) | Medium (credentials) | High (SSH, env, keys) |
| Code pattern | Python function call | YAML batch job | SSH + rsync |
| Same code local/remote | Yes (decorator only) | Mostly (YAML wraps script) | Separate workflow |
| Data transfer | Volumes (cloud) | rsync / cloud storage | rsync |
| Debugging | Logs only | SSH into VM | Full SSH |
| Idle cost | None (per-second) | None (teardown) | Yes (must stop pod) |
| Raw GPU cost | Higher (~$2-4/hr) | Varies (cheapest cloud) | Lowest (~$0.7-1.6/hr) |
| Spot/preemptible | No | Yes | No |
| Vendor lock-in | Modal only | Multi-cloud | RunPod only |
Recommendation for This Project¶
Near-Term (Current)¶
Continue using RunPods manual workflow for the initial Evo2 exon classifier experiments. The infrastructure is already set up, and the first priority is proving the training pipeline works end-to-end locally.
Medium-Term (When Training Becomes Frequent)¶
Adopt Modal for the programmatic dispatch pattern. This maps directly to the small/medium/large tier strategy:
| Tier | Hardware | Method | When |
|---|---|---|---|
| Small | Local (MPS/CPU/CUDA) | Direct local call | Developing training logic |
| Medium | Modal T4/A10G | train.remote(...) |
Validated pipeline, real data subset |
| Large | Modal A100/H100 | train.remote(...) |
Production training, full genome |
Integration with AdaptiveTrainer¶
The planned AdaptiveTrainer class (see foundation_models/docs/training/deepspeed_training.md) could be extended to support a backend parameter:
trainer = AdaptiveTrainer(
model=classifier,
backend="auto", # "local" | "modal" | "skypilot" | "auto"
)
# "auto" selects: local if small data + MPS/CUDA available,
# modal if large data or explicit GPU request
trainer.fit(train_data, val_data)
This keeps the training code identical regardless of where it runs.
Long-Term (If Multi-Cloud Cost Optimization Matters)¶
Add SkyPilot as an option for spot-instance training jobs that can tolerate preemption and longer setup times, especially for large-scale sweeps across the cheapest available provider.