Methods Documentation¶

This directory contains detailed documentation on the methodological approaches used in the EHR sequencing framework.

Disease Progression Modeling¶

Causal Survival Analysis (2-Part Tutorial)¶

Why this matters: Simple classification tasks like "does this patient have diabetes?" can often be solved with rule-based methods. The real power of sequence modeling emerges when predicting disease progression over time.

Part 1: Causal Progression Labels ¶

Topics covered: - The most dangerous data leakage pattern in temporal prediction - Why patient-level labels + visit-level inputs = temporal leakage - Three diagnostic tests to detect leakage - Designing progression labels that respect causality - Three approaches: fixed-horizon, discrete-time survival, continuous-time survival

Key insight: If a model can "predict" an outcome before clinical evidence exists, you have leakage. High AUC is not a virtue—causality is.

Part 2: Discrete-Time Survival Modeling ¶

Topics covered: - What discrete-time survival modeling actually means - Understanding censoring (conceptually and operationally) - Deriving the likelihood formula from first principles - PyTorch implementation of the survival loss - Why this is causal by construction

Key insight: Visits are the natural discretization for EHR data. Discrete-time survival modeling fits perfectly with visit-based sequences.

Implementation: - Loss function: src/ehrsequencing/models/losses.py::DiscreteTimeSurvivalLoss - Model: src/ehrsequencing/models/survival_lstm.py::DiscreteTimeSurvivalLSTM - Training script: examples/train_survival_lstm.py

Quick Reference¶

When to Use Each Approach¶

Task	Approach	Why
Static classification	Logistic regression, simple NN	No temporal dynamics needed
Fixed-horizon prediction	LSTM + BCE loss	Simple, interpretable, requires careful censoring
Disease progression	LSTM + discrete-time survival	Natural for visits, handles censoring, causal
Irregular timing	Cox-style continuous-time	Flexible timing, standard in epidemiology
Multiple outcomes	Competing risks survival	Multiple event types, only one can occur first

Evaluation Metrics¶

Classification: AUC, precision, recall, calibration
Survival: Concordance index (C-index), calibration, survival curves
Temporal: Time-dependent AUC, Brier score

Common Pitfalls¶

Temporal leakage: Using future information in predictions
Censoring as negative: Treating censored patients as "no event"
Ignoring visit frequency: Confounding surveillance with risk
Patient-level labels: Losing temporal resolution
No diagnostic tests: Not verifying causality

Getting Started¶

For Researchers¶

Read Part 1 to understand the leakage problem
Read Part 2 for implementation details
Review the training script: examples/train_survival_lstm.py
Adapt to your specific outcome and dataset

For Practitioners¶

Start with the training script: examples/train_survival_lstm.py
Modify the create_survival_labels() function for your outcome
Adjust model hyperparameters as needed
Evaluate with C-index and calibration plots

For Students¶

Work through the tutorials in order
Implement the loss function from scratch (good exercise!)
Compare discrete-time vs. fixed-horizon on the same data
Run the diagnostic tests on the model

References¶

Survival Analysis¶

Singer, J. D., & Willett, J. B. (2003). Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence. Oxford University Press.
Tutz, G., & Schmid, M. (2016). Modeling Discrete Time-to-Event Data. Springer.
Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B, 34(2), 187-202.

EHR Sequence Modeling¶

Choi, E., et al. (2016). RETAIN: An interpretable predictive model for healthcare using reverse time attention mechanism. NeurIPS.
Rajkomar, A., et al. (2018). Scalable and accurate deep learning with electronic health records. npj Digital Medicine.
Steinberg, E., et al. (2021). Language models are an effective representation learning technique for electronic health record data. Journal of Biomedical Informatics.

Temporal Causality¶

Pearl, J. (2009). Causality: Models, Reasoning, and Inference (2^nd ed.). Cambridge University Press.
Hernán, M. A., & Robins, J. M. (2020). Causal Inference: What If. Chapman & Hall/CRC.

Contributing¶

Found an error or have a suggestion? Please open an issue or submit a pull request.

When adding new methods documentation: 1. Include mathematical derivations with intuition 2. Provide PyTorch implementation examples 3. Explain when to use vs. not use the method 4. Add references to key papers 5. Include common pitfalls and debugging tips

Methods Documentation¶

Disease Progression Modeling¶

Causal Survival Analysis (2-Part Tutorial)¶

Part 1: Causal Progression Labels ¶

Part 2: Discrete-Time Survival Modeling ¶

Other Methodological Topics¶

Within-Visit Structure ¶

LSTM Variable-Length Analysis ¶

Quick Reference¶

When to Use Each Approach¶

Evaluation Metrics¶

Common Pitfalls¶

Getting Started¶

For Researchers¶

For Practitioners¶

For Students¶

References¶

Survival Analysis¶

EHR Sequence Modeling¶

Temporal Causality¶

Contributing¶

Methods Documentation¶

Disease Progression Modeling¶

Causal Survival Analysis (2-Part Tutorial)¶

Part 1: Causal Progression Labels¶

Part 2: Discrete-Time Survival Modeling¶

Other Methodological Topics¶

Within-Visit Structure¶

LSTM Variable-Length Analysis¶

Quick Reference¶

When to Use Each Approach¶

Evaluation Metrics¶

Common Pitfalls¶

Getting Started¶

For Researchers¶

For Practitioners¶

For Students¶

References¶

Survival Analysis¶

EHR Sequence Modeling¶

Temporal Causality¶

Contributing¶

Part 1: Causal Progression Labels ¶

Part 2: Discrete-Time Survival Modeling ¶

Within-Visit Structure ¶

LSTM Variable-Length Analysis ¶