Installation Guide¶
Prerequisites¶
- Conda/Mamba: For environment management
- Python: 3.10 or higher (< 3.13)
- Git: For version control
Quick Start¶
1. Clone Repository¶
2. Create Environment¶
Choose the appropriate environment file for your system:
macOS (M1/M2/M3 Mac) - Recommended:
# Uses MPS (Metal Performance Shaders) for GPU acceleration
mamba env create -f environment-macos.yml
mamba activate ehrsequencing
Linux/Windows with NVIDIA GPU:
# Uses CUDA 12.1 for GPU acceleration
mamba env create -f environment-cuda.yml
mamba activate ehrsequencing
CPU-only (any platform):
Default (macOS-compatible):
Note: Replace
mambawithcondaif you prefer conda over mamba.
3. Install Package with Poetry¶
# Install poetry if not already installed
pip install poetry
# Install package and dependencies
poetry install
Alternative: Install with pip (editable mode)
4. Verify Installation¶
# Check Python version
python --version # Should be 3.10+
# Test import
python -c "import ehrsequencing; print(ehrsequencing.__version__)"
# Run tests
pytest tests/
Environment Management¶
Activating the Environment¶
Updating Dependencies¶
# Update from environment file (use your platform-specific file)
mamba env update -f environment-macos.yml # macOS
# or
mamba env update -f environment-cuda.yml # Linux/Windows GPU
# or
mamba env update -f environment-cpu.yml # CPU-only
# Update with poetry
poetry update
Deactivating¶
Development Setup¶
Additional Development Tools¶
# Install pre-commit hooks (optional)
pip install pre-commit
pre-commit install
# Install Jupyter extensions
jupyter contrib nbextension install --user
IDE Setup¶
VS Code:
1. Install Python extension
2. Select interpreter: ehrsequencing environment
3. Enable linting (Ruff) and formatting (Black)
PyCharm:
1. Set project interpreter to ehrsequencing environment
2. Enable Black formatter
3. Configure Ruff for linting
Data Setup¶
Synthea (Synthetic Data)¶
# Download and install Synthea
# See: https://github.com/synthetichealth/synthea
# Generate synthetic data
./run_synthea -p 10000
# Move to project data directory
mkdir -p data/synthea
cp output/csv/*.csv data/synthea/
MIMIC-III/IV (Real Data)¶
- Apply for access: https://physionet.org/
- Complete CITI training
- Sign data use agreement
- Download data (after approval)
- Set up PostgreSQL database (optional)
Pre-trained Models¶
# Download CEHR-BERT pre-trained embeddings
# See the pretrained embeddings guide for details
mkdir -p checkpoints/cehrbert
# Download from Hugging Face or model repository
Hardware-Specific Setup¶
M1 MacBook (Local Development)¶
# Verify MPS (Metal Performance Shaders) support
python -c "import torch; print(f'MPS available: {torch.backends.mps.is_available()}')"
# Use small model configs for development
# See: docs/implementation/resource-aware-models.md
RunPod / Cloud GPU¶
For detailed setup on RunPod or cloud GPU instances: - Includes A40, A100, RTX 4090 configurations - SSH setup, data transfer, and training workflows - See the RunPods Training Guide for details
Troubleshooting¶
Conda Environment Issues¶
# Remove and recreate environment
mamba env remove -n ehrsequencing
mamba env create -f environment.yml
Poetry Installation Issues¶
Import Errors¶
PyTorch MPS Issues (M1 Mac)¶
# If MPS is not available, PyTorch will fall back to CPU
# Ensure you have the latest PyTorch version
mamba install pytorch::pytorch -c pytorch
Database Connection (MIMIC)¶
# Test PostgreSQL connection
psql -h localhost -U your_username -d mimic3
# Set environment variables
export MIMIC_USER=your_username
export MIMIC_PASSWORD=your_password
Next Steps¶
- Read documentation:
docs/README.md - Explore notebooks:
notebooks/README.md - Run examples:
examples/README.md - Check implementation plan:
docs/implementation/visit-grouped-sequences.md - Review model configs:
docs/implementation/resource-aware-models.md
Detailed Installation Guides¶
For more detailed guides, see:
- Pretrained Embeddings Guide - CEHR-BERT, Med-BERT
- RunPods Training Guide - Cloud GPU training
- Data Generation Guide - Synthea setup
Getting Help¶
- Check the documentation for detailed guides
- Open an issue on GitHub
Related Projects¶
- loinc-predictor - LOINC code prediction
- genai-lab - Generative AI for biomedical data