Skip to content

Meta-Layer Naming Convention

Definitive guide to model and evaluation protocol naming in the Agentic-SpliceAI meta-layer. All documents and code should follow these conventions.


Principle: Models vs Evaluation Protocols

Models are what you train — defined by architecture + training labels. Evaluation protocols are how you test — defined by the test set and the question being asked.

Any model can be evaluated with any protocol. A model's name should not encode the evaluation setting.


Models

Models follow the pattern: M{task}-{level}

  • Task number (1-4): the prediction task, from easiest to hardest
  • Level: S (sequence-level CNN) or P (position-level, e.g. XGBoost)
Model Architecture Training labels Purpose
M1-S Seq-level dilated CNN MANE (~370K sites) Canonical splice classification
M1-P XGBoost (position) MANE Position-level baseline
M2-S Seq-level dilated CNN Ensembl (~2.8M sites) Alternative splice site detection
M3-S Seq-level dilated CNN Ensembl, junction=target Novel site discovery
M4-S Seq-level dilated CNN Variant pairs (planned) Perturbation-induced splice changes

Key distinctions

  • M1-S vs M2-S: Same architecture, different training labels. M1-S sees only MANE canonical transcripts; M2-S sees the full Ensembl annotation including alternative splice sites.
  • M2-S is NOT "M1-S retrained on Ensembl" — it's a distinct model designed for a different task (alternative site detection vs canonical classification).
  • M3-S differs from M2-S in that junction features become the target (held out) rather than input, forcing the model to predict novel sites without RNA-seq evidence.

Version suffixes (optional)

When architecture changes are significant, append a version: - M1-S v1 — probability-space blend (retired) - M1-S v2 — logit-space blend with learned temperature (current)


Evaluation Protocols

Evaluation protocols follow the pattern: Eval-{test_set}

Protocol Test set Question answered
Eval-MANE MANE splice sites on test chroms How well does the model classify canonical sites?
Eval-Ensembl-Alt Ensembl MANE (set difference) Can the model detect alternative sites beyond MANE?
Eval-GENCODE-Alt GENCODE MANE (set difference) Broader alternative site evaluation (curated)
Eval-ClinVar ClinVar splice variants Can delta scores distinguish pathogenic from benign?
Eval-SpliceVarDB SpliceVarDB validated variants Cross-validation against experimental evidence

Combining models and protocols

Results are described as: {Model} on {Protocol}

Examples: - "M1-S on Eval-MANE" → canonical classification (PR-AUC 0.9996) - "M1-S on Eval-Ensembl-Alt" → testing M1-S OOD generalization - "M2-S on Eval-Ensembl-Alt" → testing M2-S on its target task (PR-AUC 0.965) - "M2-S on Eval-MANE" → does M2-S maintain canonical performance?


Legacy Naming (Deprecated)

The following names appeared in earlier documents and should be translated to the current convention:

Old name New name Notes
M2a Eval-Ensembl-Alt (protocol) Was ambiguously used as both eval and model
M2b Eval-GENCODE-Alt (protocol) Same ambiguity
M2c M2-S (model) The Ensembl-trained model, not an eval variant
M2d M2-S with junction weighting Training variant, not a separate model code
M2e Tissue-conditioned M2-S Future extension
M1-S/MANE M1-S Redundant — M1-S is always MANE-trained
M1-S/Ensembl M2-S This IS the M2 model
M2c model M2-S Clearest name

File and Directory Naming

Model outputs

output/meta_layer/
  m1s/                  ← M1-S checkpoint (current: v2 logit blend)
  m1s_v1_prob_blend/    ← M1-S v1 (preserved, retired)
  m2s/                  ← M2-S checkpoint (was: m2c/)

Gene caches (annotation-indexed, model-agnostic)

gene_cache_mane/        ← MANE train/val/test
gene_cache_ensembl/     ← Ensembl train/val/test
gene_cache_gencode/     ← GENCODE test

Evaluation results

output/meta_layer/
  m1s_eval/                     ← M1-S on Eval-MANE
  m2s_eval_ensembl_alt/         ← M2-S on Eval-Ensembl-Alt
  m2s_eval_gencode_alt/         ← M2-S on Eval-GENCODE-Alt
  m1s_eval_ensembl_alt/         ← M1-S on Eval-Ensembl-Alt (OOD test)

Summary

Models:      M1-S, M2-S, M3-S, M4-S  (what you train)
Protocols:   Eval-MANE, Eval-Ensembl-Alt, Eval-GENCODE-Alt, Eval-ClinVar  (how you test)
Results:     "{Model} on {Protocol}"  (unambiguous)

This separation ensures that: 1. Model names encode the training task, not the evaluation setting 2. Evaluation protocols are reusable across models 3. New models or protocols can be added without renaming existing ones 4. Results are always attributable to a specific model + protocol pair