Skip to content

Agentic-SpliceAI — Development Roadmap

North Star: Enable novel isoform discovery for drug target identification by building a multi-layer pipeline that goes beyond canonical splice annotations to uncover disease-specific, tissue-specific, and variant-induced RNA isoforms with therapeutic potential.


Phase Overview

Phase Description Status
1 Base Layer Done
2 Data Preparation Done
2.5 Bioinformatics Lab UI Done
3 Workflow Orchestration Done
4 Feature Engineering & Multimodal Evidence Done
5 Foundation Models Experimental
6 Meta Layer Training Active Research
7 Agentic Validation Layer Planned
8 Variant Analysis Phase 2 Done, Phase 3 Next
9 Isoform Discovery Ultimate Goal
10+ Drug Target Validation & Deployment Future

The roadmap above is the phase-level view of project progress. For a complementary functionality-level view — which user-facing applications exist, what maturity tier they're in, and which examples/ scripts drive them — see the Application Ledger.

The two views answer different questions: the roadmap answers "what comes next"; the ledger answers "what currently runs and how mature is it". Applications are curated bundles of example milestones; products (not yet any) are applications graduated to deployable commitments.

See dev/system_design/ for the underlying R&D methodology (portable across projects).


Phase Details

Phase 1: Base Layer — COMPLETE

  • Port SpliceAI and OpenSpliceAI prediction engines
  • Set up genomic resources (GTF, FASTA, annotations)
  • Build BaseModelRunner with data preparation
  • Deliverable: Canonical splice site predictions (MANE baseline)

Phase 2: Data Preparation — COMPLETE

  • Data preparation module with CLI (agentic-spliceai-prepare)
  • MANE annotation support for OpenSpliceAI consistency
  • Deliverable: Production-ready data pipeline

Phase 2.5: Bioinformatics Lab UI — COMPLETE

  • Gene Browser (browse, search, filter ~19K genes)
  • Metrics Dashboard (evaluation results, model comparison)
  • Genome View (on-demand prediction, 3-track Plotly visualization)
  • LRU prediction cache + peak-preserving downsampling
  • Deliverable: FastAPI + Jinja2 + Plotly.js web service at server/bio/ (port 8005)

Phase 3: Workflow Orchestration — COMPLETE

  • Chunking and checkpointing for genome-scale processing (PredictionWorkflow)
  • Artifact management (ArtifactManager with atomic writes)
  • Mode-aware output paths, evaluation metrics
  • Deliverable: Production base layer with full workflows
  • Verified: chr22 -- 423 genes, 17.6M positions, 5 chunks, 12.4 min

Phase 4: Feature Engineering & Multimodal Evidence — COMPLETE

  • Modality protocol with auto-registration (FeaturePipeline)
  • 9 modalities with 100 feature columns:
  • base_scores (43), annotation (3), sequence (3), genomic (4)
  • conservation (9), epigenetic (12), junction (12), rbp_eclip (8), chrom_access (6)
  • Genome-scale FeatureWorkflow with --augment for incremental modality addition
  • YAML-driven config system with 4 profiles
  • Position alignment verification (features/verification.py)
  • Deliverable: 9-modality feature pipeline -- 100 feature columns
  • Verified: Full-genome feature generation across 17 chromosomes
  • See: examples/features/docs/ for per-modality tutorials

Phase 5: Foundation Models — EXPERIMENTAL

  • Evo2-based exon classifier, HDF5 embedding cache
  • Device-aware quantization routing, 4 classifier architectures
  • SkyPilot + RunPod cloud workflows
  • Deliverable: Independent sub-project at foundation_models/

Phase 6: Meta Layer Training — ACTIVE RESEARCH

Hierarchical multi-task prediction framework — shared 9-modality feature infrastructure with specialized model heads for progressively harder tasks:

Variant Purpose Status
M1-S v2 Canonical Classification Done — logit-space blend, PR-AUC 0.9954, FPs -15.5%
Eval-Ensembl-Alt Ensembl alternative sites evaluation Done — M2-S PR-AUC 0.965
Eval-GENCODE-Alt GENCODE alternative sites evaluation Done — M2-S PR-AUC 0.907
M2-S Ensembl-trained model Done — 59% recall on alternative sites
M3 Novel Site Discovery (junction as held-out target) Planned
M4 Perturbation-Induced (variant/disease/treatment effects) Phase 1A+1B Done

Logit-space blend (v2): Replaced the probability-space residual blend with a product-of-experts formulation: softmax((alpha * meta_logits + (1-alpha) * log(base_probs)) / T). Per-class learned temperature subsumes post-hoc calibration. blend_alpha now receives gradients during training (was stuck at 0.5 in v1).

OOD generalization fixed: v1 meta model hurt on alternative sites (PR-AUC 0.704 < base 0.749). v2 logit-space blend enables graceful degradation — when the meta-CNN is uncertain, the base model signal dominates. v2 now exceeds the base model on alternative sites (0.775 > 0.749).

Key Insight: Junction support is the #2 feature by SHAP (31.3%), reducing false negatives by 60-70%.

See: - docs/meta_layer/methods/00_model_variants_m1_m4.md for the full M1-M4 framework - examples/meta_layer/results/ for M1-S v2, M2, and ablation results - examples/meta_layer/docs/ood_generalization.md for OOD analysis

Phase 7: Agentic Validation Layer — PLANNED

  • Literature Agent (PubMed, arXiv, splice databases)
  • Expression Agent (GTEx, TCGA, ENCODE)
  • Clinical Agent (ClinVar, COSMIC, disease associations)
  • Conservation Agent (cross-species PhyloP)
  • Nexus Research Agent orchestration
  • Self-improvement feedback loop (validation results refine meta layer)
  • Deliverable: AI-validated predictions with biological context

Phase 8: Variant Analysis — PHASE 2 DONE

Use-case-driven R&D for M4 variant effect prediction.

Sub-phase Description Status
1A VariantRunner — ref/alt delta computation Done
1B SpliceEventDetector — consequence classification Done
2 ClinVar + MutSpliceDB benchmarking, radius sweep Done
3 Clinical pathogenicity head (stack variant-level features) Next
4 Saturation mutagenesis & SpliceVarDB validation Planned
5 Agentic variant interpretation Planned

Validated: 13 disease-gene variants (10 genes, both strands), 4 SpliceAI paper cases with RNA-seq confirmed cryptic site positions (MYBPC3 and FAM229B match within 2bp of RNA-seq ground truth).

Variant delta recovery: v2 logit-space blend preserves 45-95% of base model signal (v1: 20-71%). Cryptic donor gains amplified beyond base model.

Phase 2 benchmark findings (2026-04-15):

  • On splice-filtered ClinVar (N=2,059; 77% pathogenic prevalence), base model, M1-S v2, and M2-S v2 all reach PR-AUC ≈ 0.92 / ROC-AUC ≈ 0.75 — a statistical tie. Unfiltered ClinVar (N=11,310) floors at PR-AUC ≈ 0.72 because ~89% of pathogenic variants there are non-splicing mechanisms.
  • On MutSpliceDB (N=434), M2-S v2 wins consequence concordance by +23 pts (68% vs 45%) — the clinically meaningful metric for "what type of splice defect".
  • Key architectural insight: M2-S v2 is a strict feature-set superset of base but doesn't outperform base on ClinVar Δ-based ranking. Reason: multimodal features (conservation, junction, RBP, chromatin, epigenetic) are locus-level and identical between ref and alt at SNV positions — so they cancel in the Δ computation and carry no variant-specific signal. Meta-layer value is in locus classification (alt-site recall, consequence type), not pathogenicity ranking.

Phase 8 — Sub-phase 3: Clinical Pathogenicity Head

To move PR-AUC meaningfully above the base-model ceiling on ClinVar, the next phase stacks variant-level features with the splice-Δ score in a downstream classifier — mirroring CADD, REVEL, ClinPred, and SpliceAI's own pipeline convention.

Architecture: small lightweight classifier (logistic regression or gradient-boosted trees) on top of features that genuinely differentiate pathogenic from benign at the variant level (not just locus level):

Feature Source Expected contribution
log(gnomAD_AF + ε) gnomAD v4 Single strongest non-splice feature; typically +0.05 to +0.15 PR-AUC on ClinVar
Gene constraint (LOEUF, pLI) gnomAD v4 constraint Refines prior by gene essentiality
Variant-differential motif disruption ESEfinder / RESCUE-ESE + variant-aware scoring Captures ESE/ISE/branchpoint disruption that splice-Δ alone misses
Protein-level deleteriousness AlphaMissense / ESM-variant For the ~40% of splice-proximal variants that also affect coding sequence
Splice-Δ score (from M1-S v2 or M2-S v2) This pipeline The current splice-specific signal
Splice consequence type (from M2-S v2) This pipeline Classification-level evidence

This clinical head is not a replacement for the meta-layer — it sits downstream and composes the splice-delta signal with orthogonal variant- level information. M1-S v2 (or base model directly) can feed the Δ input; M2-S v2 feeds the consequence-type feature. The head is trained on ClinVar Pathogenic/Benign with a held-out test split, and evaluated on SpliceVarDB and HGMD (if licensed) for out-of-distribution validation.

See: - examples/variant_analysis/ for scripts and results - examples/variant_analysis/results/m4_benchmark_sweep.md for the Phase 2 benchmark report - docs/applications/variant_analysis/ for Phase 3+ application plan

Phase 9: Isoform Discovery — ULTIMATE GOAL

  • Novel splice site detector (high-delta-score sites beyond MANE)
  • Isoform reconstruction (virtual transcripts from predicted splice sites)
  • RNA-seq junction validation across GTEx tissues
  • Confidence scoring with multi-source evidence
  • Drug target pipeline: isoform -> druggability assessment -> lead candidates

Phase 10+: Drug Target Validation & Deployment — FUTURE

  • Druggability assessment for novel isoform targets
  • Biomarker development (liquid biopsy, companion diagnostics)
  • Production platform deployment
  • Cloud-native scaling and API services

Success Metrics

Discovery Metrics (Phase 9)

  • 100+ novel isoforms discovered with high confidence
  • 70% RNA-seq junction validation rate across GTEx tissues

  • 50% literature confirmation for top candidates

Clinical Metrics (Phases 8-9)

  • 30% of VUS variants reclassified through splice impact analysis

  • 90% diagnostic accuracy for splice-affecting variants

Foundation Model Metrics (Phase 5)

  • 0.9 AUROC for exon boundary classification

  • 10K+ base pairs per second inference on A40 GPU

Last Updated: April 2026