Meta-Layer Naming Convention¶
Definitive guide to model and evaluation protocol naming in the Agentic-SpliceAI meta-layer. All documents and code should follow these conventions.
Principle: Models vs Evaluation Protocols¶
Models are what you train — defined by architecture + training labels. Evaluation protocols are how you test — defined by the test set and the question being asked.
Any model can be evaluated with any protocol. A model's name should not encode the evaluation setting.
Models¶
Models follow the pattern: M{task}-{level}
- Task number (1-4): the prediction task, from easiest to hardest
- Level: S (sequence-level CNN) or P (position-level, e.g. XGBoost)
| Model | Architecture | Training labels | Purpose |
|---|---|---|---|
| M1-S | Seq-level dilated CNN | MANE (~370K sites) | Canonical splice classification |
| M1-P | XGBoost (position) | MANE | Position-level baseline |
| M2-S | Seq-level dilated CNN | Ensembl (~2.8M sites) | Alternative splice site detection |
| M3-S | Seq-level dilated CNN | Ensembl, junction=target | Novel site discovery |
| M4-S | Seq-level dilated CNN | Variant pairs (planned) | Perturbation-induced splice changes |
Key distinctions¶
- M1-S vs M2-S: Same architecture, different training labels. M1-S sees only MANE canonical transcripts; M2-S sees the full Ensembl annotation including alternative splice sites.
- M2-S is NOT "M1-S retrained on Ensembl" — it's a distinct model designed for a different task (alternative site detection vs canonical classification).
- M3-S differs from M2-S in that junction features become the target (held out) rather than input, forcing the model to predict novel sites without RNA-seq evidence.
Version suffixes (optional)¶
When architecture changes are significant, append a version:
- M1-S v1 — probability-space blend (retired)
- M1-S v2 — logit-space blend with learned temperature (current)
Evaluation Protocols¶
Evaluation protocols follow the pattern: Eval-{test_set}
| Protocol | Test set | Question answered |
|---|---|---|
| Eval-MANE | MANE splice sites on test chroms | How well does the model classify canonical sites? |
| Eval-Ensembl-Alt | Ensembl MANE (set difference) | Can the model detect alternative sites beyond MANE? |
| Eval-GENCODE-Alt | GENCODE MANE (set difference) | Broader alternative site evaluation (curated) |
| Eval-ClinVar | ClinVar splice variants | Can delta scores distinguish pathogenic from benign? |
| Eval-SpliceVarDB | SpliceVarDB validated variants | Cross-validation against experimental evidence |
Combining models and protocols¶
Results are described as: {Model} on {Protocol}
Examples: - "M1-S on Eval-MANE" → canonical classification (PR-AUC 0.9996) - "M1-S on Eval-Ensembl-Alt" → testing M1-S OOD generalization - "M2-S on Eval-Ensembl-Alt" → testing M2-S on its target task (PR-AUC 0.965) - "M2-S on Eval-MANE" → does M2-S maintain canonical performance?
Legacy Naming (Deprecated)¶
The following names appeared in earlier documents and should be translated to the current convention:
| Old name | New name | Notes |
|---|---|---|
| M2a | Eval-Ensembl-Alt (protocol) | Was ambiguously used as both eval and model |
| M2b | Eval-GENCODE-Alt (protocol) | Same ambiguity |
| M2c | M2-S (model) | The Ensembl-trained model, not an eval variant |
| M2d | M2-S with junction weighting | Training variant, not a separate model code |
| M2e | Tissue-conditioned M2-S | Future extension |
| M1-S/MANE | M1-S | Redundant — M1-S is always MANE-trained |
| M1-S/Ensembl | M2-S | This IS the M2 model |
| M2c model | M2-S | Clearest name |
File and Directory Naming¶
Model outputs¶
output/meta_layer/
m1s/ ← M1-S checkpoint (current: v2 logit blend)
m1s_v1_prob_blend/ ← M1-S v1 (preserved, retired)
m2s/ ← M2-S checkpoint (was: m2c/)
Gene caches (annotation-indexed, model-agnostic)¶
gene_cache_mane/ ← MANE train/val/test
gene_cache_ensembl/ ← Ensembl train/val/test
gene_cache_gencode/ ← GENCODE test
Evaluation results¶
output/meta_layer/
m1s_eval/ ← M1-S on Eval-MANE
m2s_eval_ensembl_alt/ ← M2-S on Eval-Ensembl-Alt
m2s_eval_gencode_alt/ ← M2-S on Eval-GENCODE-Alt
m1s_eval_ensembl_alt/ ← M1-S on Eval-Ensembl-Alt (OOD test)
Summary¶
Models: M1-S, M2-S, M3-S, M4-S (what you train)
Protocols: Eval-MANE, Eval-Ensembl-Alt, Eval-GENCODE-Alt, Eval-ClinVar (how you test)
Results: "{Model} on {Protocol}" (unambiguous)
This separation ensures that: 1. Model names encode the training task, not the evaluation setting 2. Evaluation protocols are reusable across models 3. New models or protocols can be added without renaming existing ones 4. Results are always attributable to a specific model + protocol pair