Configuration System Design¶
Purpose: Type-safe, environment-aware configuration management
Component: splice_engine/config/
Status: ✅ Implemented, documented
Overview¶
The configuration system provides a single source of truth for all project settings, combining:
- YAML files for human-readable defaults
- Environment variables for deployment flexibility
- Dataclasses for type safety
- Automatic path resolution for portability
Core Principles¶
1. Configuration, Not Code¶
Settings should be configurable, not hardcoded:
# BAD - Hardcoded
GENOME_BUILD = "GRCh38"
DATA_ROOT = "/Users/me/data"
# GOOD - Configured
from agentic_spliceai.splice_engine.config import config
build = config.build
data_root = config.data_root
2. Environment Hierarchy¶
Configuration cascades from multiple sources:
1. Defaults (settings.yaml)
↓
2. Environment variables (highest priority)
↓
3. Runtime overrides (programmatic)
3. Type Safety¶
Use Python type hints for safety and IDE support:
Implementation¶
See Resource Management for complete implementation details.
Quick Reference¶
Load configuration:
from agentic_spliceai.splice_engine.config import config
# Access settings
print(config.build) # "GRCh38"
print(config.data_root) # Path("/path/to/data")
Environment overrides:
Access in code:
Configuration File Structure¶
settings.yaml¶
Location: src/agentic_spliceai/splice_engine/config/settings.yaml
# Global defaults
species: homo_sapiens
default_build: GRCh38
default_release: "112"
data_root: data
# Derived datasets
derived_datasets:
splice_sites: "splice_sites_enhanced.tsv"
gene_features: "gene_features.tsv"
# Base models
base_models:
spliceai:
training_build: "GRCh37"
annotation_source: "ensembl"
openspliceai:
training_build: "GRCh38"
annotation_source: "mane"
# Build specifications
builds:
GRCh38:
annotation_source: ensembl
gtf: "Homo_sapiens.GRCh38.{release}.gtf"
fasta: "Homo_sapiens.GRCh38.dna.primary_assembly.fa"
Environment Variables¶
Naming Convention¶
Use prefix SS_ (SpliceAI System):
SS_BUILD # Genome build
SS_RELEASE # Annotation release
SS_DATA_ROOT # Data root directory
SS_SPECIES # Species (default: homo_sapiens)
Deployment Examples¶
Local development:
Remote server:
Docker:
CI/CD:
Multi-Environment Support¶
Local Development¶
# Use local data symlink
export SS_DATA_ROOT=./data
# Or point to meta-spliceai
export SS_DATA_ROOT=../meta-spliceai/data
Testing¶
# Override config for tests
def test_with_custom_config():
config = load_config()
config.data_root = Path("tests/fixtures/data")
# Test with isolated data
Production¶
Best Practices¶
DO ✅¶
Use environment variables for deployment:
Access via config object:
Use type hints:
DON'T ❌¶
Don't hardcode paths:
Don't scatter configuration:
Don't ignore type hints:
Testing¶
Unit Tests¶
def test_config_loading():
"""Should load config with defaults."""
cfg = load_config()
assert cfg.species == "homo_sapiens"
assert cfg.build in ["GRCh37", "GRCh38"]
def test_env_override():
"""Should override with environment variables."""
os.environ["SS_BUILD"] = "GRCh37"
cfg = load_config()
assert cfg.build == "GRCh37"
Integration Points¶
Resource Management¶
Configuration system provides paths for resource management:
Base Layer¶
Base models use config for build-specific settings:
Meta Layer¶
Meta layer uses config for training parameters:
Summary¶
Key Benefits¶
✅ Single Source: All settings in one place
✅ Type Safe: Dataclass with type hints
✅ Environment Aware: Overrides for deployment
✅ Portable: Same code, different environments
✅ Validated: Checks settings on load
Design Principles Applied¶
- ✅ Configuration Over Code
- ✅ Explicit Over Implicit
- ✅ Type Safety
- ✅ Fail Fast (validation on load)
Related: Resource Management, Output Management
Implementation: src/agentic_spliceai/splice_engine/config/
Last Updated: February 15, 2026