Agentic-SpliceAI — Development Roadmap¶
North Star: Enable novel isoform discovery for drug target identification by building a multi-layer pipeline that goes beyond canonical splice annotations to uncover disease-specific, tissue-specific, and variant-induced RNA isoforms with therapeutic potential.
Phase Overview¶
| Phase | Description | Status |
|---|---|---|
| 1 | Base Layer | Done |
| 2 | Data Preparation | Done |
| 2.5 | Bioinformatics Lab UI | Done |
| 3 | Workflow Orchestration | Done |
| 4 | Feature Engineering & Multimodal Evidence | Done |
| 5 | Foundation Models | Experimental |
| 6 | Meta Layer Training | Active Research |
| 7 | Agentic Validation Layer | Planned |
| 8 | Variant Analysis | Phase 2 Done, Phase 3 Next |
| 9 | Isoform Discovery | Ultimate Goal |
| 10+ | Drug Target Validation & Deployment | Future |
Related views¶
The roadmap above is the phase-level view of project progress. For a
complementary functionality-level view — which user-facing
applications exist, what maturity tier they're in, and which
examples/ scripts drive them — see the
Application Ledger.
The two views answer different questions: the roadmap answers "what comes next"; the ledger answers "what currently runs and how mature is it". Applications are curated bundles of example milestones; products (not yet any) are applications graduated to deployable commitments.
See dev/system_design/ for the underlying R&D
methodology (portable across projects).
Phase Details¶
Phase 1: Base Layer — COMPLETE¶
- Port SpliceAI and OpenSpliceAI prediction engines
- Set up genomic resources (GTF, FASTA, annotations)
- Build BaseModelRunner with data preparation
- Deliverable: Canonical splice site predictions (MANE baseline)
Phase 2: Data Preparation — COMPLETE¶
- Data preparation module with CLI (
agentic-spliceai-prepare) - MANE annotation support for OpenSpliceAI consistency
- Deliverable: Production-ready data pipeline
Phase 2.5: Bioinformatics Lab UI — COMPLETE¶
- Gene Browser (browse, search, filter ~19K genes)
- Metrics Dashboard (evaluation results, model comparison)
- Genome View (on-demand prediction, 3-track Plotly visualization)
- LRU prediction cache + peak-preserving downsampling
- Deliverable: FastAPI + Jinja2 + Plotly.js web service at
server/bio/(port 8005)
Phase 3: Workflow Orchestration — COMPLETE¶
- Chunking and checkpointing for genome-scale processing (PredictionWorkflow)
- Artifact management (ArtifactManager with atomic writes)
- Mode-aware output paths, evaluation metrics
- Deliverable: Production base layer with full workflows
- Verified: chr22 -- 423 genes, 17.6M positions, 5 chunks, 12.4 min
Phase 4: Feature Engineering & Multimodal Evidence — COMPLETE¶
- Modality protocol with auto-registration (FeaturePipeline)
- 9 modalities with 100 feature columns:
- base_scores (43), annotation (3), sequence (3), genomic (4)
- conservation (9), epigenetic (12), junction (12), rbp_eclip (8), chrom_access (6)
- Genome-scale FeatureWorkflow with
--augmentfor incremental modality addition - YAML-driven config system with 4 profiles
- Position alignment verification (
features/verification.py) - Deliverable: 9-modality feature pipeline -- 100 feature columns
- Verified: Full-genome feature generation across 17 chromosomes
- See:
examples/features/docs/for per-modality tutorials
Phase 5: Foundation Models — EXPERIMENTAL¶
- Evo2-based exon classifier, HDF5 embedding cache
- Device-aware quantization routing, 4 classifier architectures
- SkyPilot + RunPod cloud workflows
- Deliverable: Independent sub-project at
foundation_models/
Phase 6: Meta Layer Training — ACTIVE RESEARCH¶
Hierarchical multi-task prediction framework — shared 9-modality feature infrastructure with specialized model heads for progressively harder tasks:
| Variant | Purpose | Status |
|---|---|---|
| M1-S v2 | Canonical Classification | Done — logit-space blend, PR-AUC 0.9954, FPs -15.5% |
| Eval-Ensembl-Alt | Ensembl alternative sites evaluation | Done — M2-S PR-AUC 0.965 |
| Eval-GENCODE-Alt | GENCODE alternative sites evaluation | Done — M2-S PR-AUC 0.907 |
| M2-S | Ensembl-trained model | Done — 59% recall on alternative sites |
| M3 | Novel Site Discovery (junction as held-out target) | Planned |
| M4 | Perturbation-Induced (variant/disease/treatment effects) | Phase 1A+1B Done |
Logit-space blend (v2): Replaced the probability-space residual blend with
a product-of-experts formulation: softmax((alpha * meta_logits + (1-alpha) * log(base_probs)) / T).
Per-class learned temperature subsumes post-hoc calibration. blend_alpha now
receives gradients during training (was stuck at 0.5 in v1).
OOD generalization fixed: v1 meta model hurt on alternative sites (PR-AUC 0.704 < base 0.749). v2 logit-space blend enables graceful degradation — when the meta-CNN is uncertain, the base model signal dominates. v2 now exceeds the base model on alternative sites (0.775 > 0.749).
Key Insight: Junction support is the #2 feature by SHAP (31.3%), reducing false negatives by 60-70%.
See:
- docs/meta_layer/methods/00_model_variants_m1_m4.md for the full M1-M4 framework
- examples/meta_layer/results/ for M1-S v2, M2, and ablation results
- examples/meta_layer/docs/ood_generalization.md for OOD analysis
Phase 7: Agentic Validation Layer — PLANNED¶
- Literature Agent (PubMed, arXiv, splice databases)
- Expression Agent (GTEx, TCGA, ENCODE)
- Clinical Agent (ClinVar, COSMIC, disease associations)
- Conservation Agent (cross-species PhyloP)
- Nexus Research Agent orchestration
- Self-improvement feedback loop (validation results refine meta layer)
- Deliverable: AI-validated predictions with biological context
Phase 8: Variant Analysis — PHASE 2 DONE¶
Use-case-driven R&D for M4 variant effect prediction.
| Sub-phase | Description | Status |
|---|---|---|
| 1A | VariantRunner — ref/alt delta computation | Done |
| 1B | SpliceEventDetector — consequence classification | Done |
| 2 | ClinVar + MutSpliceDB benchmarking, radius sweep | Done |
| 3 | Clinical pathogenicity head (stack variant-level features) | Next |
| 4 | Saturation mutagenesis & SpliceVarDB validation | Planned |
| 5 | Agentic variant interpretation | Planned |
Validated: 13 disease-gene variants (10 genes, both strands), 4 SpliceAI paper cases with RNA-seq confirmed cryptic site positions (MYBPC3 and FAM229B match within 2bp of RNA-seq ground truth).
Variant delta recovery: v2 logit-space blend preserves 45-95% of base model signal (v1: 20-71%). Cryptic donor gains amplified beyond base model.
Phase 2 benchmark findings (2026-04-15):
- On splice-filtered ClinVar (N=2,059; 77% pathogenic prevalence), base model, M1-S v2, and M2-S v2 all reach PR-AUC ≈ 0.92 / ROC-AUC ≈ 0.75 — a statistical tie. Unfiltered ClinVar (N=11,310) floors at PR-AUC ≈ 0.72 because ~89% of pathogenic variants there are non-splicing mechanisms.
- On MutSpliceDB (N=434), M2-S v2 wins consequence concordance by +23 pts (68% vs 45%) — the clinically meaningful metric for "what type of splice defect".
- Key architectural insight: M2-S v2 is a strict feature-set superset of base but doesn't outperform base on ClinVar Δ-based ranking. Reason: multimodal features (conservation, junction, RBP, chromatin, epigenetic) are locus-level and identical between ref and alt at SNV positions — so they cancel in the Δ computation and carry no variant-specific signal. Meta-layer value is in locus classification (alt-site recall, consequence type), not pathogenicity ranking.
Phase 8 — Sub-phase 3: Clinical Pathogenicity Head¶
To move PR-AUC meaningfully above the base-model ceiling on ClinVar, the next phase stacks variant-level features with the splice-Δ score in a downstream classifier — mirroring CADD, REVEL, ClinPred, and SpliceAI's own pipeline convention.
Architecture: small lightweight classifier (logistic regression or gradient-boosted trees) on top of features that genuinely differentiate pathogenic from benign at the variant level (not just locus level):
| Feature | Source | Expected contribution |
|---|---|---|
log(gnomAD_AF + ε) |
gnomAD v4 | Single strongest non-splice feature; typically +0.05 to +0.15 PR-AUC on ClinVar |
| Gene constraint (LOEUF, pLI) | gnomAD v4 constraint | Refines prior by gene essentiality |
| Variant-differential motif disruption | ESEfinder / RESCUE-ESE + variant-aware scoring | Captures ESE/ISE/branchpoint disruption that splice-Δ alone misses |
| Protein-level deleteriousness | AlphaMissense / ESM-variant | For the ~40% of splice-proximal variants that also affect coding sequence |
| Splice-Δ score (from M1-S v2 or M2-S v2) | This pipeline | The current splice-specific signal |
| Splice consequence type (from M2-S v2) | This pipeline | Classification-level evidence |
This clinical head is not a replacement for the meta-layer — it sits downstream and composes the splice-delta signal with orthogonal variant- level information. M1-S v2 (or base model directly) can feed the Δ input; M2-S v2 feeds the consequence-type feature. The head is trained on ClinVar Pathogenic/Benign with a held-out test split, and evaluated on SpliceVarDB and HGMD (if licensed) for out-of-distribution validation.
See:
- examples/variant_analysis/ for scripts and results
- examples/variant_analysis/results/m4_benchmark_sweep.md for the Phase 2 benchmark report
- docs/applications/variant_analysis/ for Phase 3+ application plan
Phase 9: Isoform Discovery — ULTIMATE GOAL¶
- Novel splice site detector (high-delta-score sites beyond MANE)
- Isoform reconstruction (virtual transcripts from predicted splice sites)
- RNA-seq junction validation across GTEx tissues
- Confidence scoring with multi-source evidence
- Drug target pipeline: isoform -> druggability assessment -> lead candidates
Phase 10+: Drug Target Validation & Deployment — FUTURE¶
- Druggability assessment for novel isoform targets
- Biomarker development (liquid biopsy, companion diagnostics)
- Production platform deployment
- Cloud-native scaling and API services
Success Metrics¶
Discovery Metrics (Phase 9)¶
- 100+ novel isoforms discovered with high confidence
-
70% RNA-seq junction validation rate across GTEx tissues
-
50% literature confirmation for top candidates
Clinical Metrics (Phases 8-9)¶
-
30% of VUS variants reclassified through splice impact analysis
-
90% diagnostic accuracy for splice-affecting variants
Foundation Model Metrics (Phase 5)¶
-
0.9 AUROC for exon boundary classification
- 10K+ base pairs per second inference on A40 GPU
Last Updated: April 2026