Package Organization Guide¶
Date: 2026-01-30
Audience: Developers working on experimental/research features
๐ Current Project Structure¶
agentic-spliceai/
โโโ src/ # Core production code (pip installable)
โ โโโ agentic_spliceai/
โ โโโ __init__.py
โ โโโ splice_engine/
โ โโโ base_layer/ # SpliceAI/OpenSpliceAI
โ โโโ meta_layer/ # Multimodal DL
โ โโโ prediction/ # Core prediction logic
โ โโโ resources/ # Resource management
โ
โโโ examples/ # Driver scripts (fast iteration)
โ โโโ base_layer/
โ โโโ data_preparation/
โ โโโ ...
โ
โโโ notebooks/ # Educational Jupyter notebooks
โ
โโโ scripts/ # Utilities, validation, tools
โ
โโโ tests/ # Unit & integration tests
โ
โโโ dev/ # Private development docs
โ โโโ sessions/
โ โโโ planning/
โ โโโ research/
โ
โโโ docs/ # Public documentation
โ
โโโ (experimental packages)? # โ WHERE TO PUT EXPERIMENTAL CODE?
๐งช Where to Put Experimental Features?¶
Decision Framework¶
For experimental features like Evo2 foundation model, consider these factors:
| Factor | Put in src/ |
Put in parallel package | Put in examples/ |
|---|---|---|---|
| Stability | Stable, tested | Experimental, evolving | Demo/prototype |
| Integration | Core system | Optional add-on | Quick test |
| Dependencies | Standard deps | Heavy deps (GPU models) | Minimal deps |
| Audience | All users | Research users | Developers |
| Installation | Always installed | Optional install | Not installed |
| Maintenance | Long-term | Research cycle | Temporary |
๐ Recommendation for Evo2¶
โ Option 1: Parallel Package (RECOMMENDED)¶
Structure:
agentic-spliceai/
โโโ src/ # Core production code
โ โโโ agentic_spliceai/
โ
โโโ foundation_models/ # โ Parallel experimental package
โ โโโ __init__.py
โ โโโ README.md
โ โโโ pyproject.toml # Separate dependencies
โ โโโ environment-evo2.yml # GPU-specific environment
โ โโโ evo2/
โ โโโ __init__.py
โ โโโ model.py # Evo2 integration
โ โโโ inference.py # Inference logic
โ โโโ adapters/
โ โโโ spliceai_adapter.py # Bridge to splice_engine
โ
โโโ examples/
โ โโโ foundation_models/ # Evo2 examples
โ โโโ 01_evo2_prediction.py
โ
โโโ notebooks/
โโโ foundation_models/ # Evo2 tutorials
โโโ 01_evo2_basics.ipynb
Benefits:
- โ
Optional installation: Users can install core without heavy deps
- โ
Separate dependencies: Evo2 needs specific GPU libs, separate environment-evo2.yml
- โ
Clear boundaries: Experimental code doesn't pollute production
- โ
Easy testing: Can run core tests without Evo2 dependencies
- โ
Flexible deployment: Deploy to pod without installing on local machine
- โ
Independent evolution: Update Evo2 without touching core
Installation:
# Core only (local development)
pip install -e .
# Core + Evo2 (pod with A40)
pip install -e .
pip install -e ./foundation_models
# Or with conda
mamba env create -f foundation_models/environment-evo2.yml
Imports (from examples):
# Add both packages to path
sys.path.insert(0, str(project_root / 'src'))
sys.path.insert(0, str(project_root / 'foundation_models'))
# Import core
from agentic_spliceai.splice_engine.base_layer import ...
# Import experimental
from evo2.model import Evo2SplicePredictor
from evo2.adapters import adapt_to_spliceai_format
Option 2: Subdirectory in src/ (If stable)¶
Structure:
src/
โโโ agentic_spliceai/
โโโ splice_engine/
โโโ experimental/ # โ Experimental features
โ โโโ __init__.py
โ โโโ foundation_models/
โ โ โโโ __init__.py
โ โ โโโ evo2/
โ โ โโโ ...
โ โโโ README.md
โโโ ...
Use when: - Features are relatively stable - Dependencies are manageable (not too heavy) - You want unified installation - Features will eventually become core
Drawbacks: - โ All users install experimental dependencies - โ Harder to separate GPU-specific code - โ More coupling with core system
Option 3: Separate Git Submodule (If very independent)¶
Structure:
agentic-spliceai/
โโโ src/
โโโ submodules/
โ โโโ agentic-evo2/ # โ Separate git repo as submodule
โ โโโ .git
โ โโโ README.md
โ โโโ pyproject.toml
โ โโโ evo2/
โโโ ...
Use when: - Feature is very independent (could be its own project) - Multiple projects might use it - Different development team/cycle - Want separate version control
๐ก Specific Recommendation for Evo2¶
Recommended Structure¶
Use Option 1 (Parallel Package) because: 1. Heavy GPU dependencies (need A40, separate environment) 2. Pod-specific testing (won't run on local Mac) 3. Research-oriented (may change rapidly) 4. Optional feature (not all users need it)
Implementation:
agentic-spliceai/
โโโ src/agentic_spliceai/ # Core (always installed)
โ
โโโ foundation_models/ # Experimental (optional)
โ โโโ __init__.py
โ โโโ README.md # Installation & usage
โ โโโ pyproject.toml
โ โ [project]
โ โ name = "agentic-spliceai-foundation-models"
โ โ dependencies = [
โ โ "agentic-spliceai", # Depends on core
โ โ "evo-model>=2.0", # Evo2 specific
โ โ "transformers>=4.30",
โ โ "torch>=2.0+cu118"
โ โ ]
โ โ
โ โโโ environment-evo2.yml # Pod environment
โ โ name: agentic-spliceai-evo2
โ โ channels: [nvidia, pytorch, conda-forge]
โ โ dependencies:
โ โ - python=3.11
โ โ - pytorch::pytorch>=2.0
โ โ - pytorch::pytorch-cuda=11.8
โ โ - evo-model
โ โ
โ โโโ evo2/
โ โ โโโ __init__.py
โ โ โโโ config.py
โ โ โโโ model.py # Evo2ModelWrapper
โ โ โโโ inference.py # Inference pipeline
โ โ โโโ adapters/
โ โ โโโ spliceai_adapter.py
โ โ
โ โโโ tests/
โ โโโ test_evo2_integration.py
โ
โโโ examples/foundation_models/
โ โโโ README.md
โ โโโ 01_evo2_setup.md # Setup on pod
โ โโโ 02_evo2_prediction.py # Usage example
โ
โโโ notebooks/foundation_models/
โโโ 01_evo2_splice_prediction.ipynb
๐ง Integration Pattern¶
Core โ Experimental Bridge¶
In src/agentic_spliceai/splice_engine/base_layer/models/:
# registry.py - Model registry with dynamic loading
_MODEL_REGISTRY = {
'spliceai': 'agentic_spliceai.splice_engine.base_layer.prediction.core.load_spliceai',
'openspliceai': 'agentic_spliceai.splice_engine.base_layer.prediction.core.load_openspliceai',
}
def register_foundation_model(name: str, loader_path: str):
"""Register experimental foundation model (optional)."""
_MODEL_REGISTRY[name] = loader_path
def load_model(model_name: str, **kwargs):
"""Load model by name (core or experimental)."""
if model_name not in _MODEL_REGISTRY:
raise ValueError(f"Unknown model: {model_name}")
module_path, func_name = _MODEL_REGISTRY[model_name].rsplit('.', 1)
module = importlib.import_module(module_path)
loader = getattr(module, func_name)
return loader(**kwargs)
In foundation_models/evo2/__init__.py:
# Auto-register when foundation_models is imported
from agentic_spliceai.splice_engine.base_layer.models.registry import register_foundation_model
register_foundation_model(
'evo2',
'evo2.model.load_evo2_splice_model'
)
Usage:
# Works with or without foundation_models installed
# Core models (always available)
runner = BaseModelRunner()
result = runner.run_single_model(model_name='openspliceai', ...)
# Experimental models (if foundation_models installed)
try:
import evo2 # Triggers registration
result = runner.run_single_model(model_name='evo2', ...)
except ImportError:
print("Evo2 not available - install foundation_models package")
๐ฆ Installation Workflows¶
Local Development (Core Only)¶
Pod Development (Core + Evo2)¶
# On pod with A40
cd agentic-spliceai
# Create Evo2 environment
mamba env create -f foundation_models/environment-evo2.yml
conda activate agentic-spliceai-evo2
# Install both packages
pip install -e . # Core
pip install -e ./foundation_models # Evo2
# Test
python examples/foundation_models/02_evo2_prediction.py
CI/CD Testing¶
# .github/workflows/test.yml
jobs:
test-core:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: pip install -e .
- run: pytest tests/
test-evo2:
runs-on: [self-hosted, gpu-a40] # Pod runner
steps:
- uses: actions/checkout@v3
- run: conda env create -f foundation_models/environment-evo2.yml
- run: pip install -e . && pip install -e ./foundation_models
- run: pytest foundation_models/tests/
๐ Summary & Decision¶
For Evo2 Foundation Model: Use Parallel Package โ ¶
Rationale: 1. Deployment flexibility: Run core locally, Evo2 on pod 2. Dependency isolation: Heavy GPU libs separate 3. Development speed: Rapid iteration without core changes 4. Optional feature: Users can choose to install 5. Clear boundaries: Experimental vs production
File to Create:
# Create parallel package
mkdir -p foundation_models/evo2/adapters
touch foundation_models/{__init__.py,README.md,pyproject.toml}
touch foundation_models/evo2/{__init__.py,model.py,inference.py}
touch foundation_models/evo2/adapters/spliceai_adapter.py
touch foundation_models/environment-evo2.yml
Import Pattern (in examples):
# examples/foundation_models/02_evo2_prediction.py
import sys
from pathlib import Path
# Add both packages
project_root = Path(__file__).parent.parent
sys.path.insert(0, str(project_root / 'src'))
sys.path.insert(0, str(project_root / 'foundation_models'))
# Import core
from agentic_spliceai.splice_engine.base_layer import ...
# Import experimental
from evo2.model import Evo2SplicePredictor
โ Implementation Status (March 2026)¶
Option 1 (Parallel Package) was adopted. The foundation_models/ sub-project is fully operational:
foundation_models/
โโโ pyproject.toml # Separate dependencies
โโโ README.md # Setup + hardware requirements
โโโ foundation_models/
โ โโโ evo2/ # Evo2-based exon classifier
โ โ โโโ config.py # Evo2Config (device auto-detect)
โ โ โโโ model.py # HuggingFace wrapper
โ โ โโโ embedder.py # Chunked extraction + HDF5 cache
โ โ โโโ classifier.py # ExonClassifier (linear/MLP/CNN/LSTM)
โ โโโ classifiers/ # Splice classifiers
โ โ โโโ splice_classifier.py # Direct shard predictor
โ โโโ utils/ # Quantization, chunking
โโโ configs/
โ โโโ gpu_config.yaml # Infrastructure defaults (GPU, volume, deps)
โ โโโ skypilot/ # SkyPilot cloud deployment (RunPod)
โโโ docs/ # Sub-project documentation
Key achievements: - 4 classifier architectures (linear, MLP, CNN, LSTM) - Device-aware quantization routing (INT8 for MPS/CPU, bitsandbytes for CUDA) - SkyPilot + RunPod cloud workflows (A40/A100 GPU) - GPU task runner with generic SkyPilot config builder + launcher - Ops scripts for cluster provisioning, data staging, and pipeline execution - Direct shard splice predictor for foundation model fine-tuning
See: foundation_models/README.md for current setup and hardware requirements
This guide follows the pattern established by genai-lab and other research-oriented projects in your workspace.