Tutorial 2a: Refining Comorbidity Burden Proxies¶

Supplementary to: Tutorial 2: Synthetic Data Design
Part of: Discrete-Time Survival Analysis for EHR Sequences
Audience: Researchers building survival models for EHR data

Table of Contents¶

The Problem with Aggregate Code Counts
Medical Code Taxonomy
Refined Approaches
Implementation Strategies
Clinical Validity Considerations
Recommended Approach

The Problem with Aggregate Code Counts¶

Current Approach¶

The original tutorial uses total code count per visit as a proxy for comorbidity burden:

# From TUTORIAL_02, lines 92, 135-145
risk_factors = {
    'comorbidity': avg_codes_per_visit,  # All codes treated equally
}

def compute_comorbidity(patient_sequence):
    codes_per_visit = [visit.num_codes() for visit in patient_sequence.visits]
    avg_codes = np.mean(codes_per_visit)
    return avg_codes

The Limitation¶

Key Issue: Not all medical codes are equal indicators of disease burden.

Consider two patients with 10 codes per visit:

Patient A (High actual burden): - 5 ICD codes: CHF, COPD, CKD, Diabetes, Hypertension - 3 LOINC codes: BNP, Creatinine, HbA1c (monitoring) - 2 CPT codes: Echocardiogram, Chest X-ray (diagnostics)

Patient B (Lower actual burden): - 1 ICD code: Well-controlled Type 2 Diabetes - 2 LOINC codes: HbA1c, Lipid panel (routine monitoring) - 7 NDC codes: Metformin, Lisinopril, Atorvastatin, Aspirin, Vitamin D, Fish oil, Multivitamin

Observation: Patient B has similar code count but much lower disease severity.

Specific Concerns by Code Type¶

1. Drug Codes (NDC/RxNorm)¶

Problem: Medication count ≠ disease severity

Complex regimens for "small" diseases:
Migraine prophylaxis might involve 3-4 medications
Mild asthma might require 2-3 inhalers
Polypharmacy patterns:
Elderly patients on 10+ medications (preventive + treatment)
Drug-drug interaction management adds more drugs
Supplements and OTC medications:
Vitamins, supplements inflate counts
Not indicative of morbidity

2. Lab/Test Codes (LOINC)¶

Problem: More tests ≠ sicker patient

Monitoring intensity:
Well-controlled diabetes: frequent HbA1c checks
Stable warfarin patient: weekly INR monitoring
Diagnostic workup:
Healthy patient with vague symptoms: extensive testing
Established diagnosis: fewer confirmatory tests
Screening protocols:
Comprehensive metabolic panel = 1 order, 14 LOINC codes
Doesn't indicate disease burden

3. Procedure Codes (CPT/HCPCS)¶

Problem: Procedures vary wildly in invasiveness

Minor vs. major:
5 outpatient minor procedures ≠ 1 open-heart surgery
Wound dressing changes (frequent) vs. transplant (rare)
Preventive vs. therapeutic:
Colonoscopy (screening) vs. emergency surgery
Both count as "1 procedure code"

4. Diagnosis Codes (ICD-9/10-CM)¶

Most informative: But still has nuances

Severity hierarchy:
"Essential hypertension" vs. "Hypertensive crisis"
"Type 2 diabetes" vs. "Diabetes with complications"
Redundancy:
Multiple codes for same condition (primary + manifestations)
Coding practices vary by institution

Medical Code Taxonomy¶

Code Type Hierarchy by Informative Value¶

For comorbidity burden estimation:

Most Informative → Least Informative

1. Diagnostic codes (ICD-9/10-CM)
   - Direct disease indicators
   - Severity stratification possible
   - Strong predictor of outcomes

2. Procedure codes (CPT/HCPCS)
   - Indicates interventions
   - Complexity varies (need weighting)
   - Major procedures = high burden

3. Lab/Test codes (LOINC)
   - Indirect indicators (monitoring/diagnosis)
   - High frequency ≠ high burden
   - Pattern matters more than count

4. Medication codes (NDC/RxNorm)
   - Treatment complexity
   - Confounded by polypharmacy
   - Class matters more than count

Code Type Characteristics¶

Code Type	Namespace	Count Meaning	Severity Proxy	Recommended Weight
Diagnosis	ICD-9/10	Number of conditions	Strong	1.0 (baseline)
Procedure	CPT/HCPCS	Number of interventions	Moderate	0.7-0.9
Lab/Test	LOINC	Monitoring intensity	Weak	0.3-0.5
Medication	NDC/RxNorm	Treatment complexity	Weak	0.2-0.4

Refined Approaches¶

Approach 1: Code-Type Weighted Counts¶

Principle: Weight codes by their informative value for disease burden.

Implementation¶

def compute_weighted_comorbidity(patient_sequence, code_weights):
    """
    Compute comorbidity burden using code-type-specific weights.

    Args:
        patient_sequence: Patient visit history
        code_weights: Dict mapping code types to weights

    Returns:
        Weighted average code count per visit
    """
    weighted_codes_per_visit = []

    for visit in patient_sequence.visits:
        # Count codes by type
        icd_count = len(visit.get_codes_by_type('ICD'))
        cpt_count = len(visit.get_codes_by_type('CPT'))
        loinc_count = len(visit.get_codes_by_type('LOINC'))
        ndc_count = len(visit.get_codes_by_type('NDC'))

        # Apply weights
        weighted_count = (
            code_weights['ICD'] * icd_count +
            code_weights['CPT'] * cpt_count +
            code_weights['LOINC'] * loinc_count +
            code_weights['NDC'] * ndc_count
        )

        weighted_codes_per_visit.append(weighted_count)

    return np.mean(weighted_codes_per_visit)

# Recommended weights
code_weights = {
    'ICD': 1.0,    # Diagnoses: full weight
    'CPT': 0.8,    # Procedures: high weight
    'LOINC': 0.4,  # Lab tests: moderate weight
    'NDC': 0.3     # Medications: low weight
}

# Example comparison
# Patient with 5 ICD + 3 LOINC + 7 NDC codes
unweighted = 5 + 3 + 7 = 15 codes
weighted = 1.0*5 + 0.4*3 + 0.3*7 = 8.3 effective codes
# More accurately reflects disease burden

Rationale¶

Prioritizes diagnostic information: ICD codes get full weight
Down-weights less informative codes: Labs and meds contribute less
Clinically interpretable: "Effective code count" approximates true burden
Preserves correlation: High-burden patients still have higher scores

Approach 2: Diagnosis-Only Focus¶

Principle: Use only diagnosis codes for comorbidity burden.

Implementation¶

def compute_diagnosis_burden(patient_sequence):
    """
    Compute comorbidity burden using only diagnosis codes.
    """
    icd_per_visit = []

    for visit in patient_sequence.visits:
        icd_codes = visit.get_codes_by_type('ICD')
        icd_per_visit.append(len(icd_codes))

    return np.mean(icd_per_visit)

# Normalization
norm_comorbidity = avg_icd_per_visit / 10.0  # Typical max: 10 diagnoses

Advantages¶

Maximum specificity: Direct measure of diagnosed conditions
Avoids confounders: Not affected by medication polypharmacy
Clear interpretation: "Average number of diagnoses per visit"

Disadvantages¶

Ignores treatment intensity: Complex procedures not captured
Sensitive to coding practices: Under-coding biases downward
May miss severity: Well-managed conditions with few active diagnoses

Approach 3: Stratified Complexity Scores¶

Principle: Create separate scores for each code type, then combine.

Implementation¶

def compute_stratified_burden(patient_sequence):
    """
    Compute separate burden scores for each code type.
    """
    # Compute type-specific averages
    avg_icd = np.mean([len(v.get_codes_by_type('ICD')) for v in visits])
    avg_cpt = np.mean([len(v.get_codes_by_type('CPT')) for v in visits])
    avg_loinc = np.mean([len(v.get_codes_by_type('LOINC')) for v in visits])
    avg_ndc = np.mean([len(v.get_codes_by_type('NDC')) for v in visits])

    # Normalize each separately
    norm_icd = avg_icd / 10.0      # Typical max: 10 diagnoses
    norm_cpt = avg_cpt / 5.0       # Typical max: 5 procedures
    norm_loinc = avg_loinc / 15.0  # Typical max: 15 lab tests
    norm_ndc = avg_ndc / 10.0      # Typical max: 10 medications

    # Create composite score
    complexity_scores = {
        'diagnostic': norm_icd,
        'procedural': norm_cpt,
        'monitoring': norm_loinc,
        'therapeutic': norm_ndc
    }

    # Weighted combination
    burden_score = (
        0.50 * complexity_scores['diagnostic'] +
        0.25 * complexity_scores['procedural'] +
        0.15 * complexity_scores['monitoring'] +
        0.10 * complexity_scores['therapeutic']
    )

    return burden_score, complexity_scores

# Returns both composite and component scores
total_burden, components = compute_stratified_burden(patient_sequence)

Advantages¶

Multidimensional view: Captures different aspects of complexity
Interpretable components: Can analyze which dimension drives risk
Flexible weighting: Easy to adjust based on outcome of interest

Use Case¶

# Example: Different weights for different outcomes

# Mortality risk: emphasize diagnoses and procedures
mortality_weights = {'diagnostic': 0.5, 'procedural': 0.3, 
                     'monitoring': 0.1, 'therapeutic': 0.1}

# Readmission risk: emphasize therapeutic complexity
readmission_weights = {'diagnostic': 0.3, 'procedural': 0.2,
                       'monitoring': 0.2, 'therapeutic': 0.3}

Approach 4: Clinical Condition Counting (Gold Standard)¶

Principle: Map ICD codes to distinct clinical conditions using hierarchical groupers.

Implementation¶

def compute_condition_count(patient_sequence, grouper='CCS'):
    """
    Count distinct clinical conditions using hierarchical grouper.

    Groupers:
        - CCS (Clinical Classifications Software): ~300 categories
        - AHRQ Elixhauser: 31 comorbidity categories
        - Charlson: 17 weighted comorbidity categories
    """
    all_icd_codes = []
    for visit in patient_sequence.visits:
        all_icd_codes.extend(visit.get_codes_by_type('ICD'))

    # Map to condition categories
    conditions = set()
    for icd_code in all_icd_codes:
        category = grouper.map_to_category(icd_code)
        conditions.add(category)

    # Count distinct conditions
    num_conditions = len(conditions)

    return num_conditions

# Example using Elixhauser comorbidities
elixhauser_conditions = [
    'CHF', 'Arrhythmia', 'Valvular', 'PVD', 'HTN',
    'Paralysis', 'Neuro', 'Pulmonary', 'DM', 'Hypothyroid',
    'Renal', 'Liver', 'PUD', 'HIV', 'Lymphoma',
    'Mets', 'Tumor', 'Rheumatoid', 'Coagulopathy', 'Obesity',
    'WeightLoss', 'Fluid', 'Anemia', 'BloodLoss', 'Alcohol',
    'Drug', 'Psychoses', 'Depression'
]

# Patient has ICD codes mapping to:
icd_codes = ['I50.9', 'I48.91', 'E11.9', 'N18.3']
# Maps to:
conditions = ['CHF', 'Arrhythmia', 'DM', 'Renal']
# Condition count = 4 (not 4 ICD codes, but 4 distinct conditions)

Advantages¶

Clinically meaningful: Aligns with medical understanding
Reduces redundancy: Multiple ICD codes for same condition = 1
Standard metrics: Can use validated comorbidity indices
Best reflects actual burden: Gold standard in epidemiology

Disadvantages¶

Requires mapping: Need grouper software/lookup tables
Implementation complexity: More code infrastructure
Less granular: Loses within-condition severity variation

Recommended Groupers¶

CCS (Clinical Classifications Software) - ICD-9/10-CM
~300 categories (e.g., "Heart failure", "Diabetes")
Good for general comorbidity counting
Free from AHRQ
Elixhauser Comorbidity Index
31 comorbidity categories
Validated for mortality/readmission prediction
Widely used in health services research
Charlson Comorbidity Index
17 weighted comorbidity categories
Weights reflect mortality impact
Original index: sum weights (not just count)

Implementation Strategies¶

Phase 1: Baseline (Current)

# Start with aggregate count
comorbidity = avg_codes_per_visit

Phase 2: Type Weighting

# Add code-type weights
comorbidity = weighted_avg_codes_per_visit
code_weights = {'ICD': 1.0, 'CPT': 0.8, 'LOINC': 0.4, 'NDC': 0.3}

Phase 3: Diagnosis Focus

# Refine to diagnosis-heavy
comorbidity = 0.7 * avg_icd + 0.3 * weighted_other

Phase 4: Condition Grouping

# Ultimate: distinct conditions
comorbidity = num_distinct_conditions / 10.0  # Normalize

Strategy 2: Sensitivity Analysis¶

Compare approaches on same data:

# Compute multiple comorbidity metrics
metrics = {
    'raw_count': compute_comorbidity(seq),
    'weighted': compute_weighted_comorbidity(seq, weights),
    'icd_only': compute_diagnosis_burden(seq),
    'condition_count': compute_condition_count(seq)
}

# Train models with each metric
for metric_name, comorbidity_values in metrics.items():
    outcomes = generate_outcomes(sequences, comorbidity=comorbidity_values)
    model = train_model(sequences, outcomes)
    performance[metric_name] = evaluate_model(model)

# Compare: which metric gives best risk stratification?
print(f"AUC by metric:\n{performance}")

Strategy 3: Ensemble Risk Score¶

Combine multiple metrics:

def compute_ensemble_comorbidity(patient_sequence):
    """
    Combine multiple comorbidity proxies.
    """
    # Compute variants
    raw = avg_codes_per_visit / 20.0
    weighted = weighted_avg / 15.0
    icd_only = avg_icd / 10.0

    # Ensemble with weights
    ensemble = (
        0.2 * raw +       # Keep some raw signal
        0.3 * weighted +  # Moderate weight
        0.5 * icd_only    # Emphasize diagnoses
    )

    return ensemble

Clinical Validity Considerations¶

1. Face Validity¶

Question: Does the metric align with clinical intuition?

Test cases:

# Patient A: 10 serious diagnoses (CHF, COPD, CKD, etc.)
# Patient B: 1 diagnosis + 20 vitamins/supplements

# Which has higher comorbidity?
# Answer: Patient A

# Check your metric:
assert comorbidity_A > comorbidity_B, "Metric fails face validity"

2. Predictive Validity¶

Question: Does the metric predict actual outcomes?

# On real labeled data (if available)
correlation = np.corrcoef(comorbidity_scores, actual_mortality)[0,1]

# Strong correlation (|r| > 0.3) = good predictive validity
# Weak correlation (|r| < 0.15) = reconsider metric

3. Construct Validity¶

Question: Does the metric correlate with established indices?

# Compare to Charlson or Elixhauser score
charlson_scores = compute_charlson_index(patients)
your_scores = compute_comorbidity(patients)

correlation = np.corrcoef(charlson_scores, your_scores)[0,1]

# High correlation (r > 0.6) = good construct validity

4. Discriminative Ability¶

Question: Does the metric separate high-risk from low-risk patients?

# Compare score distributions
high_risk_patients = patients[actual_events == 1]
low_risk_patients = patients[actual_events == 0]

mean_high = comorbidity[high_risk_patients].mean()
mean_low = comorbidity[low_risk_patients].mean()

# Effect size (Cohen's d)
effect_size = (mean_high - mean_low) / pooled_std

# d > 0.5 = medium effect (adequate)
# d > 0.8 = large effect (excellent)

Recommended Approach¶

For Synthetic Data Generation¶

Recommendation: Use Approach 1 (Code-Type Weighted Counts) as starting point.

Rationale: 1. Easy to implement: Minor modification to existing code 2. Clinically sound: Prioritizes informative code types 3. Preserves correlation: Still captures high-risk patients 4. Good compromise: Between simplicity and accuracy

Updated Risk Factor Computation¶

def compute_risk_factors_refined(patient_sequence):
    """
    Compute risk factors with code-type-aware comorbidity.
    """
    # Code type weights
    code_weights = {
        'ICD': 1.0,    # Diagnoses
        'CPT': 0.8,    # Procedures
        'LOINC': 0.4,  # Lab tests
        'NDC': 0.3     # Medications
    }

    # 1. Weighted comorbidity burden
    weighted_codes = []
    for visit in patient_sequence.visits:
        visit_weighted = sum(
            code_weights.get(code.type, 0.5) * 1
            for code in visit.codes
        )
        weighted_codes.append(visit_weighted)

    avg_weighted_codes = np.mean(weighted_codes)
    norm_comorbidity = avg_weighted_codes / 15.0  # Normalized

    # 2. Visit frequency (unchanged)
    num_visits = len(patient_sequence.visits)
    time_span_years = (visits[-1].timestamp - visits[0].timestamp).days / 365.0
    frequency = num_visits / max(time_span_years, 0.1)
    norm_frequency = frequency / 5.0

    # 3. Diagnosis diversity (refined)
    # Use only ICD codes for diversity metric
    all_icd_codes = []
    for visit in patient_sequence.visits:
        all_icd_codes.extend([c for c in visit.codes if c.type == 'ICD'])

    if len(all_icd_codes) > 0:
        unique_icd = len(set(all_icd_codes))
        total_icd = len(all_icd_codes)
        icd_diversity = unique_icd / total_icd
    else:
        icd_diversity = 0.5  # Neutral if no diagnoses

    # Low diversity = repeated diagnoses = chronic = high risk
    risk_from_diversity = 1 - icd_diversity

    # 4. Combine into risk score
    risk_score = (
        0.4 * norm_comorbidity +      # Weighted code burden
        0.4 * norm_frequency +         # Visit frequency
        0.2 * risk_from_diversity      # Diagnosis diversity
    )

    risk_score = np.clip(risk_score, 0.1, 0.9)

    return risk_score

Comparison: Old vs. New¶

# Example patient: 
# - 5 ICD codes
# - 3 LOINC codes
# - 7 NDC codes
# Total = 15 codes

# OLD METHOD (unweighted)
comorbidity_old = 15 / 20.0 = 0.75

# NEW METHOD (weighted)
weighted = (1.0*5) + (0.4*3) + (0.3*7) = 5 + 1.2 + 2.1 = 8.3
comorbidity_new = 8.3 / 15.0 = 0.55

# NEW is lower → more accurate reflection of moderate (not high) burden

Future Directions¶

1. Severity-Weighted ICD Codes¶

Idea: Not all diagnoses equal; weight by severity.

# Example using ICD hierarchy
severity_weights = {
    'I50.9': 1.0,   # Heart failure (severe)
    'I10': 0.5,     # Essential HTN (moderate)
    'Z23': 0.1      # Vaccination encounter (minimal)
}

weighted_icd_burden = sum(severity_weights.get(code, 0.7) for code in icd_codes)

2. Temporal Comorbidity Evolution¶

Idea: Increasing comorbidity over time = higher risk.

# Compare early vs. late visit comorbidity
early_burden = avg_comorbidity(visits[:3])
late_burden = avg_comorbidity(visits[-3:])

comorbidity_trend = late_burden - early_burden
# Positive trend = worsening = higher risk

3. Medication Class Analysis¶

Idea: Weight medications by therapeutic class.

high_risk_meds = {
    'anticoagulants': 1.5,   # High risk
    'immunosuppressants': 1.3,
    'chemotherapy': 1.5,
    'insulin': 1.2,
    'opioids': 1.1,
    'vitamins': 0.1          # Low risk
}

4. Procedure Invasiveness Scoring¶

Idea: Major procedures indicate higher burden.

# CPT-based invasiveness
invasiveness = {
    'major_surgery': 2.0,    # CABG, transplant
    'minor_surgery': 1.0,    # Excision, repair
    'diagnostic': 0.5,       # Endoscopy, imaging
    'minimal': 0.2           # Wound care, injections
}

Summary¶

Key Points¶

Aggregate code counts are imperfect proxies for comorbidity burden
Different code types have different informative value:
ICD (diagnoses) > CPT (procedures) > LOINC (labs) > NDC (meds)
Code-type weighting is a practical refinement (easy to implement)
Condition counting is the gold standard (requires grouper software)
Clinical validation is essential (test against known cases)

Practical Recommendations¶

For synthetic data generation: - Start with code-type weighted counts (Approach 1) - Validate correlation with event times (should remain r < -0.5) - Conduct sensitivity analysis (compare metrics)

For real-world risk modeling: - Use established comorbidity indices (Charlson/Elixhauser) - Map ICD codes to clinical conditions - Weight by severity and prognostic impact

Revised Risk Factor Equation¶

Original (TUTORIAL_02):

comorbidity = avg_codes_per_visit / 20.0

Refined (TUTORIAL_02a):

# Code-type weights
weights = {'ICD': 1.0, 'CPT': 0.8, 'LOINC': 0.4, 'NDC': 0.3}

# Weighted average
weighted_codes = sum(weights[code.type] for code in visit.codes)
comorbidity = avg_weighted_codes / 15.0

References¶

Elixhauser Comorbidity Index
Elixhauser et al. (1998). Medical Care, 36(1), 8-27.
Charlson Comorbidity Index
Charlson et al. (1987). Journal of Chronic Diseases, 40(5), 373-383.
Clinical Classifications Software (CCS)
AHRQ Healthcare Cost and Utilization Project (HCUP).
https://www.hcup-us.ahrq.gov/toolssoftware/ccs/ccs.jsp
ICD Code Hierarchies
WHO ICD-10-CM Official Guidelines for Coding and Reporting.

Tutorial 2a: Refining Comorbidity Burden Proxies¶

Table of Contents¶

The Problem with Aggregate Code Counts¶

Current Approach¶

The Limitation¶

Specific Concerns by Code Type¶

1. Drug Codes (NDC/RxNorm)¶

2. Lab/Test Codes (LOINC)¶

3. Procedure Codes (CPT/HCPCS)¶

4. Diagnosis Codes (ICD-9/10-CM)¶

Medical Code Taxonomy¶

Code Type Hierarchy by Informative Value¶

Code Type Characteristics¶

Refined Approaches¶

Approach 1: Code-Type Weighted Counts¶

Implementation¶

Rationale¶

Approach 2: Diagnosis-Only Focus¶

Implementation¶

Advantages¶

Disadvantages¶

Approach 3: Stratified Complexity Scores¶

Implementation¶

Advantages¶

Use Case¶

Approach 4: Clinical Condition Counting (Gold Standard)¶

Implementation¶

Advantages¶

Disadvantages¶

Recommended Groupers¶

Implementation Strategies¶

Strategy 1: Phased Refinement (Recommended)¶

Strategy 2: Sensitivity Analysis¶

Strategy 3: Ensemble Risk Score¶

Clinical Validity Considerations¶

1. Face Validity¶

2. Predictive Validity¶

3. Construct Validity¶

4. Discriminative Ability¶

Recommended Approach¶

For Synthetic Data Generation¶

Updated Risk Factor Computation¶

Comparison: Old vs. New¶

Future Directions¶

1. Severity-Weighted ICD Codes¶

2. Temporal Comorbidity Evolution¶

3. Medication Class Analysis¶

4. Procedure Invasiveness Scoring¶

Summary¶

Key Points¶

Practical Recommendations¶

Revised Risk Factor Equation¶

References¶