Agentic-SpliceAI Setup Guide¶
Quick setup instructions for development and deployment
Overview¶
Agentic-SpliceAI builds upon the Meta-SpliceAI framework, combining:
- Extensible Base Layer - Foundation models (SpliceAI, OpenSpliceAI, + extensible)
- Adaptive Meta-Learning - Foundation-Adaptor framework via multimodal deep learning
- Agentic Workflows - AI agents for validation and evidence synthesis
Environment Setup¶
1. Create Dedicated Conda Environment¶
cd /Users/pleiadian53/work/agentic-spliceai
# Create environment from yml
mamba env create -f environment.yml
# Activate environment
mamba activate agentic-spliceai
2. Install Package in Development Mode¶
# Install agentic-spliceai package
pip install -e .
# Verify installation
python -c "import agentic_spliceai; print('✓ Package installed')"
3. Test CLI Commands¶
Architecture¶
Three-Layer System¶
┌─────────────────────────────────────────────────────────────┐
│ AGENTIC-SPLICEAI │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────┐ │
│ │ Nexus Research │ │ Meta-SpliceAI │ │ Agentic │ │
│ │ Agent │ │ Prediction │ │ Workflow │ │
│ ├──────────────────┤ ├──────────────────┤ ├──────────┤ │
│ │ • Literature │ │ • Base SpliceAI │ │ • LLM │ │
│ │ synthesis │ │ • Meta-learning │ │ agents │ │
│ │ • arXiv/PubMed │ │ • Ensemble │ │ • Tool │ │
│ │ • LaTeX reports │ │ models │ │ use │ │
│ │ • Web search │ │ • Feature eng. │ │ • Multi- │ │
│ │ │ │ │ │ agent │ │
│ └──────────────────┘ └──────────────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Dependency Breakdown¶
From Nexus (Research):
- LLM clients: openai, anthropic, mistralai, aisuite
- Research tools: tavily-python, wikipedia, beautifulsoup4
- PDF generation: weasyprint, pandoc, tectonic, pypandoc
- Web framework: fastapi, uvicorn, pydantic
From Meta-SpliceAI (Prediction):
- Deep learning: tensorflow, pytorch, keras
- SpliceAI: spliceai==1.3.1
- Genomics: biopython, pysam, pybedtools, pybigwig
- Genomic formats: bcbio-gff, gffutils, gtftools, pyfastx
- ML optimization: optuna, hyperopt, hpbandster
- Feature engineering: category-encoders, feature-engine
Shared (Both):
- Data: pandas, polars, numpy, duckdb
- Visualization: matplotlib, seaborn, plotly
- ML: scikit-learn, lightgbm, xgboost
- Utilities: tqdm, joblib, requests
Development Workflow¶
Phase 1: Port Nexus Research Capability¶
# Copy Nexus research agent code
cp -r /Users/pleiadian53/work/agentic-ai-lab/src/nexus/agents/research/* \
/Users/pleiadian53/work/agentic-spliceai/agentic_spliceai/research/
# Test research capability
python -c "from agentic_spliceai.research import pipeline; print('✓ Research imported')"
Phase 2: Integrate Meta-SpliceAI Prediction¶
# Copy meta-spliceai core modules
cp -r /Users/pleiadian53/work/meta-spliceai/meta_spliceai/splice_engine/* \
/Users/pleiadian53/work/agentic-spliceai/agentic_spliceai/prediction/
# Test prediction capability
python -c "from agentic_spliceai.prediction import base_model; print('✓ Prediction imported')"
Phase 3: Build Agentic Workflow¶
# Combine research + prediction with LLM agents
# Create workflow that:
# 1. Uses Nexus to research latest splice site methods
# 2. Uses Meta-SpliceAI to make predictions
# 3. Uses LLM to interpret results and generate insights
Environment Comparison¶
| Feature | agentic-ai | agentic-spliceai |
|---|---|---|
| Purpose | General research | Splice site research |
| Nexus | ✓ Full | ✓ Ported |
| Deep Learning | ✗ | ✓ TensorFlow/PyTorch |
| Genomics | ✗ | ✓ Full bioinformatics |
| SpliceAI | ✗ | ✓ Base + Meta |
| Size | ~2GB | ~5GB (with models) |
Next Steps¶
- ✅ Create environment:
mamba env create -f environment.yml - ✅ Install package:
pip install -e . - ⏳ Port Nexus research agent
- ⏳ Integrate Meta-SpliceAI prediction
- ⏳ Build unified agentic workflow
- ⏳ Test end-to-end pipeline
- ⏳ Create example notebooks
- ⏳ Write documentation
- ⏳ First commit to GitHub
Troubleshooting¶
TensorFlow/PyTorch Conflicts¶
If you encounter GPU/CUDA issues:
# CPU-only versions (lighter)
pip install tensorflow-cpu
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
Genomics Tools¶
Some genomics tools require system dependencies:
Memory Issues¶
Large models may require significant RAM: - Minimum: 16GB RAM - Recommended: 32GB RAM - With GPU: 64GB+ RAM for large-scale training
Verify Installation¶
Run the setup verification script:
This checks: - Required packages installed - Import paths working - Data directories accessible - Environment variables set
Resources¶
- Documentation:
README.md- Project overview and visionQUICKSTART.md- Quick start guidedocs/STRUCTURE.md- Complete project structuredocs/- Comprehensive documentation- Meta-SpliceAI: https://github.com/pleiadian53/meta-spliceai
- Agentic AI Lab: https://github.com/pleiadian53/agentic-ai-lab