This directory contains documentation for the various methodological approaches being developed for alternative splice site prediction.
Document Index
| Document |
Description |
Status |
| ROADMAP.md |
High-level methodology development roadmap |
Active |
| PAIRED_DELTA_PREDICTION.md |
Siamese/paired delta prediction |
Tested (r=0.38) |
| VALIDATED_DELTA_PREDICTION.md |
Single-pass with ground truth targets |
BEST (r=0.41) |
MULTI_STEP_FRAMEWORK.md (planned) |
Decomposed classification approach |
In Progress |
Quick Reference
Method Summary
PAIRED DELTA PREDICTION VALIDATED DELTA PREDICTION
───────────────────────── ─────────────────────────
ref_seq ──→ encoder ──┐ alt_seq ──→ encoder ──┐
├─→ diff ref_base ──→ embed ──┼─→ delta
alt_seq ──→ encoder ──┘ alt_base ──→ embed ──┘
Target: base_delta Target: validated_delta
Status: r=0.38 Status: r=0.41 (BEST!)
MULTI-STEP FRAMEWORK
─────────────────────
Step 1: Is splice-altering? → Binary (AUC=0.61, needs >0.7)
Step 2: What type? → Multi-class (NOT IMPLEMENTED)
Step 3: Where? → Localization (NOT IMPLEMENTED)
Step 4: How strong? → Regression (NOT IMPLEMENTED)
Key Differences
| Aspect |
Paired Delta |
Validated Delta |
Multi-Step |
| Input |
ref + alt |
alt + var_info |
alt + var_info |
| Target |
base_delta |
SpliceVarDB-validated |
classification |
| Forward passes |
2 |
1 |
1-4 |
| Interpretability |
Low |
Medium |
High |
| Current result |
r=0.38 |
r=0.41 |
AUC=0.61 |
Current Status
- Paired Delta: Tested, moderate correlation (r=0.38)
- Validated Delta: Tested, best correlation (r=0.41)
- Multi-Step Step 1: Tested, needs improvement (F1=0.53)
Priority
- HIGH: Scale Validated Delta with more data and HyenaDNA
- MEDIUM: Improve Multi-Step Step 1 (F1 > 0.7)
- MEDIUM: Test Multi-Step Steps 2-4
- LOW: Compare ensemble approaches
Naming Convention
We use descriptive names instead of cryptic labels:
| Old Name |
New Name |
Description |
| Approach A |
Paired Delta Prediction |
Uses both ref and alt sequences |
| Approach B |
Validated Delta Prediction |
Uses SpliceVarDB-validated targets |
| Phase 1 |
Canonical Classification |
Training on canonical sites |
| Phase 2 |
Delta Prediction |
Predicting score changes |
../experiments/ - Detailed experiment logs
LABELING_STRATEGY.md - Label derivation strategies (planned)
../ARCHITECTURE.md - Model architectures