GenAI-Lab Setup Guide¶
This guide explains how to properly install and configure the genailab package for development and notebook usage.
Quick Start¶
1. Install the Package in Editable Mode¶
The genailab package uses Poetry for dependency management. Install it in editable mode so notebooks can import it directly without sys.path manipulation:
# Activate the genailab conda environment
mamba activate genailab
# Install the package in editable mode
cd /path/to/genai-lab
poetry install
# Or if you prefer pip
pip install -e .
2. Verify Installation¶
# In Python or Jupyter notebook
import genailab
from genailab import get_config, get_data_dir
print(f"GenAI-Lab version: {genailab.__version__}")
print(f"Project root: {get_config().project_root}")
print(f"Data directory: {get_data_dir()}")
You should see output like:
GenAI-Lab version: 0.1.0
Project root: /Users/yourname/work/genai-lab
Data directory: /Users/yourname/work/genai-lab/data
Configuration System¶
Using the Config Module¶
The genailab.config module provides centralized configuration for paths, datasets, and models:
from genailab import get_config, get_data_dir, get_checkpoint_dir, get_device
# Get configuration
config = get_config()
# Access paths
data_dir = get_data_dir()
checkpoint_dir = get_checkpoint_dir("diffusion/medical_imaging")
device = get_device() # Auto-detects 'cuda', 'mps', or 'cpu'
# Use in your code
data_path = data_dir / "chest_xray" / "images"
checkpoint_path = checkpoint_dir / "model_epoch_1000.pt"
Directory Structure¶
After installation, the config module automatically creates:
genai-lab/
├── data/ # Datasets
├── checkpoints/ # Model checkpoints
│ └── diffusion/
│ ├── medical_imaging/
│ └── gene_expression/
├── results/ # Experiment results
└── .cache/ # Cached data
Registering Datasets¶
from genailab import get_config
config = get_config()
# Register dataset paths
config.register_dataset("chest_xray", "/path/to/chest_xray_dataset")
config.register_dataset("gene_expression", "/path/to/gene_data.h5")
# Retrieve later
chest_xray_path = config.get_dataset_path("chest_xray")
Model Configuration Presets¶
from genailab import get_diffusion_config
# Get preset configurations
unet_config = get_diffusion_config("unet2d_medium")
# Returns: {'in_channels': 1, 'out_channels': 1, 'base_channels': 64, ...}
gene_config = get_diffusion_config("tabular_gene_expression")
# Returns: {'hidden_dim': 512, 'num_layers': 8, ...}
# Use in model initialization
from genailab.diffusion import UNet2D
model = UNet2D(**unet_config)
Available presets:
- unet2d_small: 32 base channels, 3 levels
- unet2d_medium: 64 base channels, 4 levels
- unet2d_large: 128 base channels, 4 levels
- tabular_gene_expression: MLP with attention for gene data
Notebook Best Practices¶
❌ Old Way (Don't Do This)¶
# BAD: Hardcoded path manipulation
import sys
from pathlib import Path
sys.path.insert(0, str(Path('../../../src').resolve()))
from genailab.diffusion import VPSDE
✅ New Way (Recommended)¶
# GOOD: Direct import after editable install
from genailab.diffusion import VPSDE
from genailab import get_config, get_checkpoint_dir, get_device
# Use configuration
config = get_config()
device = get_device()
checkpoint_dir = get_checkpoint_dir("my_experiment")
Example Notebook Setup Cell¶
import numpy as np
import matplotlib.pyplot as plt
import torch
# Import genailab modules
from genailab.diffusion import VPSDE, UNet2D, train_score_network
from genailab import get_config, get_checkpoint_dir, get_device
# Configuration
config = get_config()
device = get_device()
checkpoint_dir = get_checkpoint_dir("diffusion/medical_imaging")
print(f"Device: {device}")
print(f"Checkpoints: {checkpoint_dir}")
# Random seeds
np.random.seed(42)
torch.manual_seed(42)
Advanced Configuration¶
Custom Configuration¶
from genailab.config import Config
from pathlib import Path
# Create custom configuration
custom_config = Config(
project_root=Path("/custom/path"),
device="cuda"
)
# Register custom datasets
custom_config.register_dataset("my_dataset", "/path/to/data")
# Register custom model configs
custom_config.register_model_config("my_unet", {
"in_channels": 3,
"base_channels": 96,
"channel_multipliers": (1, 2, 4, 8, 16),
})
# Use globally
from genailab.config import set_config
set_config(custom_config)
Environment Variables¶
You can override paths using environment variables:
Then in Python:
import os
from genailab.config import Config
config = Config(
data_dir=Path(os.getenv("GENAILAB_DATA_DIR", config.data_dir)),
checkpoint_dir=Path(os.getenv("GENAILAB_CHECKPOINT_DIR", config.checkpoint_dir)),
)
Troubleshooting¶
Import Errors¶
Problem: ModuleNotFoundError: No module named 'genailab'
Solution: Install the package in editable mode:
Verify:
Wrong Project Root¶
Problem: Config detects wrong project root
Solution: Explicitly set project root:
from genailab.config import Config
from pathlib import Path
config = Config(project_root=Path("/correct/path/to/genai-lab"))
Device Detection Issues¶
Problem: Wrong device detected (e.g., using CPU when GPU available)
Solution: Manually set device:
Migration Guide¶
Updating Existing Notebooks¶
-
Remove sys.path manipulation:
-
Add configuration imports:
-
Update checkpoint paths:
-
Update device detection:
Summary¶
Key Benefits:
- ✅ No more
sys.pathhacks in notebooks - ✅ Centralized configuration for paths and settings
- ✅ Auto-detection of project root and device
- ✅ Preset model configurations
- ✅ Easy dataset registration and retrieval
- ✅ Consistent directory structure across experiments
Next Steps:
1. Install package: poetry install
2. Update notebooks to use config module
3. Register your datasets
4. Use preset model configurations