Skip to content

Package Organization Guide

Date: 2026-01-30
Audience: Developers working on experimental/research features

๐Ÿ“‚ Current Project Structure

agentic-spliceai/
โ”œโ”€โ”€ src/                          # Core production code (pip installable)
โ”‚   โ””โ”€โ”€ agentic_spliceai/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ””โ”€โ”€ splice_engine/
โ”‚           โ”œโ”€โ”€ base_layer/       # SpliceAI/OpenSpliceAI
โ”‚           โ”œโ”€โ”€ meta_layer/       # Multimodal DL
โ”‚           โ”œโ”€โ”€ prediction/       # Core prediction logic
โ”‚           โ””โ”€โ”€ resources/        # Resource management
โ”‚
โ”œโ”€โ”€ examples/                     # Driver scripts (fast iteration)
โ”‚   โ”œโ”€โ”€ base_layer/
โ”‚   โ”œโ”€โ”€ data_preparation/
โ”‚   โ””โ”€โ”€ ...
โ”‚
โ”œโ”€โ”€ notebooks/                    # Educational Jupyter notebooks
โ”‚
โ”œโ”€โ”€ scripts/                      # Utilities, validation, tools
โ”‚
โ”œโ”€โ”€ tests/                        # Unit & integration tests
โ”‚
โ”œโ”€โ”€ dev/                          # Private development docs
โ”‚   โ”œโ”€โ”€ sessions/
โ”‚   โ”œโ”€โ”€ planning/
โ”‚   โ””โ”€โ”€ research/
โ”‚
โ”œโ”€โ”€ docs/                         # Public documentation
โ”‚
โ””โ”€โ”€ (experimental packages)?      # โ† WHERE TO PUT EXPERIMENTAL CODE?

๐Ÿงช Where to Put Experimental Features?

Decision Framework

For experimental features like Evo2 foundation model, consider these factors:

Factor Put in src/ Put in parallel package Put in examples/
Stability Stable, tested Experimental, evolving Demo/prototype
Integration Core system Optional add-on Quick test
Dependencies Standard deps Heavy deps (GPU models) Minimal deps
Audience All users Research users Developers
Installation Always installed Optional install Not installed
Maintenance Long-term Research cycle Temporary

๐Ÿ“‹ Recommendation for Evo2

Structure:

agentic-spliceai/
โ”œโ”€โ”€ src/                          # Core production code
โ”‚   โ””โ”€โ”€ agentic_spliceai/
โ”‚
โ”œโ”€โ”€ foundation_models/            # โ† Parallel experimental package
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ README.md
โ”‚   โ”œโ”€โ”€ pyproject.toml           # Separate dependencies
โ”‚   โ”œโ”€โ”€ environment-evo2.yml     # GPU-specific environment
โ”‚   โ””โ”€โ”€ evo2/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ model.py             # Evo2 integration
โ”‚       โ”œโ”€โ”€ inference.py         # Inference logic
โ”‚       โ””โ”€โ”€ adapters/
โ”‚           โ””โ”€โ”€ spliceai_adapter.py  # Bridge to splice_engine
โ”‚
โ”œโ”€โ”€ examples/
โ”‚   โ””โ”€โ”€ foundation_models/       # Evo2 examples
โ”‚       โ””โ”€โ”€ 01_evo2_prediction.py
โ”‚
โ””โ”€โ”€ notebooks/
    โ””โ”€โ”€ foundation_models/       # Evo2 tutorials
        โ””โ”€โ”€ 01_evo2_basics.ipynb

Benefits: - โœ… Optional installation: Users can install core without heavy deps - โœ… Separate dependencies: Evo2 needs specific GPU libs, separate environment-evo2.yml - โœ… Clear boundaries: Experimental code doesn't pollute production - โœ… Easy testing: Can run core tests without Evo2 dependencies - โœ… Flexible deployment: Deploy to pod without installing on local machine - โœ… Independent evolution: Update Evo2 without touching core

Installation:

# Core only (local development)
pip install -e .

# Core + Evo2 (pod with A40)
pip install -e .
pip install -e ./foundation_models

# Or with conda
mamba env create -f foundation_models/environment-evo2.yml

Imports (from examples):

# Add both packages to path
sys.path.insert(0, str(project_root / 'src'))
sys.path.insert(0, str(project_root / 'foundation_models'))

# Import core
from agentic_spliceai.splice_engine.base_layer import ...

# Import experimental
from evo2.model import Evo2SplicePredictor
from evo2.adapters import adapt_to_spliceai_format


Option 2: Subdirectory in src/ (If stable)

Structure:

src/
โ””โ”€โ”€ agentic_spliceai/
    โ”œโ”€โ”€ splice_engine/
    โ”œโ”€โ”€ experimental/             # โ† Experimental features
    โ”‚   โ”œโ”€โ”€ __init__.py
    โ”‚   โ”œโ”€โ”€ foundation_models/
    โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
    โ”‚   โ”‚   โ”œโ”€โ”€ evo2/
    โ”‚   โ”‚   โ””โ”€โ”€ ...
    โ”‚   โ””โ”€โ”€ README.md
    โ””โ”€โ”€ ...

Use when: - Features are relatively stable - Dependencies are manageable (not too heavy) - You want unified installation - Features will eventually become core

Drawbacks: - โŒ All users install experimental dependencies - โŒ Harder to separate GPU-specific code - โŒ More coupling with core system


Option 3: Separate Git Submodule (If very independent)

Structure:

agentic-spliceai/
โ”œโ”€โ”€ src/
โ”œโ”€โ”€ submodules/
โ”‚   โ””โ”€โ”€ agentic-evo2/            # โ† Separate git repo as submodule
โ”‚       โ”œโ”€โ”€ .git
โ”‚       โ”œโ”€โ”€ README.md
โ”‚       โ”œโ”€โ”€ pyproject.toml
โ”‚       โ””โ”€โ”€ evo2/
โ””โ”€โ”€ ...

Use when: - Feature is very independent (could be its own project) - Multiple projects might use it - Different development team/cycle - Want separate version control


๐Ÿ’ก Specific Recommendation for Evo2

Use Option 1 (Parallel Package) because: 1. Heavy GPU dependencies (need A40, separate environment) 2. Pod-specific testing (won't run on local Mac) 3. Research-oriented (may change rapidly) 4. Optional feature (not all users need it)

Implementation:

agentic-spliceai/
โ”œโ”€โ”€ src/agentic_spliceai/         # Core (always installed)
โ”‚
โ”œโ”€โ”€ foundation_models/            # Experimental (optional)
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ README.md                 # Installation & usage
โ”‚   โ”œโ”€โ”€ pyproject.toml
โ”‚   โ”‚   [project]
โ”‚   โ”‚   name = "agentic-spliceai-foundation-models"
โ”‚   โ”‚   dependencies = [
โ”‚   โ”‚       "agentic-spliceai",   # Depends on core
โ”‚   โ”‚       "evo-model>=2.0",     # Evo2 specific
โ”‚   โ”‚       "transformers>=4.30",
โ”‚   โ”‚       "torch>=2.0+cu118"
โ”‚   โ”‚   ]
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ environment-evo2.yml      # Pod environment
โ”‚   โ”‚   name: agentic-spliceai-evo2
โ”‚   โ”‚   channels: [nvidia, pytorch, conda-forge]
โ”‚   โ”‚   dependencies:
โ”‚   โ”‚     - python=3.11
โ”‚   โ”‚     - pytorch::pytorch>=2.0
โ”‚   โ”‚     - pytorch::pytorch-cuda=11.8
โ”‚   โ”‚     - evo-model
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ evo2/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ config.py
โ”‚   โ”‚   โ”œโ”€โ”€ model.py             # Evo2ModelWrapper
โ”‚   โ”‚   โ”œโ”€โ”€ inference.py         # Inference pipeline
โ”‚   โ”‚   โ””โ”€โ”€ adapters/
โ”‚   โ”‚       โ””โ”€โ”€ spliceai_adapter.py
โ”‚   โ”‚
โ”‚   โ””โ”€โ”€ tests/
โ”‚       โ””โ”€โ”€ test_evo2_integration.py
โ”‚
โ”œโ”€โ”€ examples/foundation_models/
โ”‚   โ”œโ”€โ”€ README.md
โ”‚   โ”œโ”€โ”€ 01_evo2_setup.md         # Setup on pod
โ”‚   โ””โ”€โ”€ 02_evo2_prediction.py    # Usage example
โ”‚
โ””โ”€โ”€ notebooks/foundation_models/
    โ””โ”€โ”€ 01_evo2_splice_prediction.ipynb

๐Ÿ”ง Integration Pattern

Core โ†’ Experimental Bridge

In src/agentic_spliceai/splice_engine/base_layer/models/:

# registry.py - Model registry with dynamic loading

_MODEL_REGISTRY = {
    'spliceai': 'agentic_spliceai.splice_engine.base_layer.prediction.core.load_spliceai',
    'openspliceai': 'agentic_spliceai.splice_engine.base_layer.prediction.core.load_openspliceai',
}

def register_foundation_model(name: str, loader_path: str):
    """Register experimental foundation model (optional)."""
    _MODEL_REGISTRY[name] = loader_path

def load_model(model_name: str, **kwargs):
    """Load model by name (core or experimental)."""
    if model_name not in _MODEL_REGISTRY:
        raise ValueError(f"Unknown model: {model_name}")

    module_path, func_name = _MODEL_REGISTRY[model_name].rsplit('.', 1)
    module = importlib.import_module(module_path)
    loader = getattr(module, func_name)
    return loader(**kwargs)

In foundation_models/evo2/__init__.py:

# Auto-register when foundation_models is imported
from agentic_spliceai.splice_engine.base_layer.models.registry import register_foundation_model

register_foundation_model(
    'evo2',
    'evo2.model.load_evo2_splice_model'
)

Usage:

# Works with or without foundation_models installed

# Core models (always available)
runner = BaseModelRunner()
result = runner.run_single_model(model_name='openspliceai', ...)

# Experimental models (if foundation_models installed)
try:
    import evo2  # Triggers registration
    result = runner.run_single_model(model_name='evo2', ...)
except ImportError:
    print("Evo2 not available - install foundation_models package")


๐Ÿ“ฆ Installation Workflows

Local Development (Core Only)

cd agentic-spliceai
pip install -e .                  # Core only
python examples/base_layer/01_phase1_prediction.py

Pod Development (Core + Evo2)

# On pod with A40
cd agentic-spliceai

# Create Evo2 environment
mamba env create -f foundation_models/environment-evo2.yml
conda activate agentic-spliceai-evo2

# Install both packages
pip install -e .                  # Core
pip install -e ./foundation_models  # Evo2

# Test
python examples/foundation_models/02_evo2_prediction.py

CI/CD Testing

# .github/workflows/test.yml
jobs:
  test-core:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: pip install -e .
      - run: pytest tests/

  test-evo2:
    runs-on: [self-hosted, gpu-a40]  # Pod runner
    steps:
      - uses: actions/checkout@v3
      - run: conda env create -f foundation_models/environment-evo2.yml
      - run: pip install -e . && pip install -e ./foundation_models
      - run: pytest foundation_models/tests/

๐Ÿ“ Summary & Decision

For Evo2 Foundation Model: Use Parallel Package โœ…

Rationale: 1. Deployment flexibility: Run core locally, Evo2 on pod 2. Dependency isolation: Heavy GPU libs separate 3. Development speed: Rapid iteration without core changes 4. Optional feature: Users can choose to install 5. Clear boundaries: Experimental vs production

File to Create:

# Create parallel package
mkdir -p foundation_models/evo2/adapters
touch foundation_models/{__init__.py,README.md,pyproject.toml}
touch foundation_models/evo2/{__init__.py,model.py,inference.py}
touch foundation_models/evo2/adapters/spliceai_adapter.py
touch foundation_models/environment-evo2.yml

Import Pattern (in examples):

# examples/foundation_models/02_evo2_prediction.py

import sys
from pathlib import Path

# Add both packages
project_root = Path(__file__).parent.parent
sys.path.insert(0, str(project_root / 'src'))
sys.path.insert(0, str(project_root / 'foundation_models'))

# Import core
from agentic_spliceai.splice_engine.base_layer import ...

# Import experimental
from evo2.model import Evo2SplicePredictor


โœ… Implementation Status (March 2026)

Option 1 (Parallel Package) was adopted. The foundation_models/ sub-project is fully operational:

foundation_models/
โ”œโ”€โ”€ pyproject.toml                    # Separate dependencies
โ”œโ”€โ”€ README.md                         # Setup + hardware requirements
โ”œโ”€โ”€ foundation_models/
โ”‚   โ”œโ”€โ”€ evo2/                         # Evo2-based exon classifier
โ”‚   โ”‚   โ”œโ”€โ”€ config.py                 # Evo2Config (device auto-detect)
โ”‚   โ”‚   โ”œโ”€โ”€ model.py                  # HuggingFace wrapper
โ”‚   โ”‚   โ”œโ”€โ”€ embedder.py               # Chunked extraction + HDF5 cache
โ”‚   โ”‚   โ””โ”€โ”€ classifier.py             # ExonClassifier (linear/MLP/CNN/LSTM)
โ”‚   โ”œโ”€โ”€ classifiers/                  # Splice classifiers
โ”‚   โ”‚   โ””โ”€โ”€ splice_classifier.py      # Direct shard predictor
โ”‚   โ””โ”€โ”€ utils/                        # Quantization, chunking
โ”œโ”€โ”€ configs/
โ”‚   โ”œโ”€โ”€ gpu_config.yaml               # Infrastructure defaults (GPU, volume, deps)
โ”‚   โ””โ”€โ”€ skypilot/                     # SkyPilot cloud deployment (RunPod)
โ””โ”€โ”€ docs/                             # Sub-project documentation

Key achievements: - 4 classifier architectures (linear, MLP, CNN, LSTM) - Device-aware quantization routing (INT8 for MPS/CPU, bitsandbytes for CUDA) - SkyPilot + RunPod cloud workflows (A40/A100 GPU) - GPU task runner with generic SkyPilot config builder + launcher - Ops scripts for cluster provisioning, data staging, and pipeline execution - Direct shard splice predictor for foundation model fine-tuning

See: foundation_models/README.md for current setup and hardware requirements


This guide follows the pattern established by genai-lab and other research-oriented projects in your workspace.