Skip to content

Compute Resources

Guides for managing compute resources across local development and remote GPU training.


Documents

Document Description
remote_training_options.md Modal vs SkyPilot vs RunPods — programmatic remote training comparison
Location Description
getting_started/RUNPODS_SETUP.md RunPods first-time setup guide
runpods.example/docs/ RunPod-specific reference (SSH, rsync, pod env setup)
foundation_models/docs/training/deepspeed_training.md DeepSpeed configs for A40/A100 training

Development Principle

Local-first, remote-when-proven. Always develop and validate the full training workflow locally on small/synthetic data before offloading to remote GPU. Remote compute should only be used when the pipeline is known to work end-to-end. This minimizes wasted GPU cost and debugging-over-SSH friction.