Agentic SpliceAI - Quick Start Guide¶
Get started with splice site analysis and research capabilities in 5 minutes!
📋 Prerequisites¶
- Python 3.10+
- API Keys:
- OpenAI API Key (required) - Get one at https://platform.openai.com/api-keys
- Tavily API Key (optional, for Nexus web search) - Get one at https://tavily.com
- Splice Site Dataset - TSV/CSV file with genomic coordinates (for splice analysis)
- LaTeX (optional, for Nexus PDF generation) - MacTeX, BasicTeX, or TeX Live
🚀 Installation¶
Step 1: Set Up Environment¶
Option A: Use Existing agentic-ai Environment (Recommended)
# Activate the existing environment
mamba activate agentic-ai
# Navigate to agentic_spliceai directory
cd agentic_spliceai
# All dependencies are already installed!
Option B: Create Standalone Environment
# Create new conda environment from environment.yml
mamba env create -f environment.yml
mamba activate agentic-spliceai
# Install package in editable mode (for development)
pip install -e .
Option C: Active Development Mode
For active development where you're modifying the code:
# Create environment manually
mamba create -n agentic-spliceai python=3.11
mamba activate agentic-spliceai
# Install in editable mode with dependencies
pip install -e .
# Or install with optional dev dependencies
pip install -e ".[dev,bio]"
Alternative: Using requirements.txt (backward compatibility)
# Create environment manually
mamba create -n agentic-spliceai python=3.11
mamba activate agentic-spliceai
# Install dependencies via pip
pip install -r requirements.txt
Note: The
agentic-aienvironment already contains all required dependencies (OpenAI, FastAPI, DuckDB, matplotlib, pandas, seaborn, etc.), so you can use it directly foragentic_spliceai. Theenvironment.ymlis the recommended approach for new installations. Usepip install -e .for development to enable import ofagentic_spliceaimodules from anywhere.
Step 2: Configure API Key¶
Option A: Use Existing Project .env (Recommended)
# The .env file at the project root (agentic-ai-public/.env) is already used
# No action needed if OPENAI_API_KEY is already set there
Option B: Create Local .env
# Copy environment template
cp .env.example .env
# Edit .env and add your OpenAI API key
# OPENAI_API_KEY=sk-your-actual-key-here
Note: Python's
dotenvwill automatically search parent directories for.envfiles, so the project root.envwill be found automatically.
Step 3: Add Your Data¶
🎯 Usage Options¶
Option 1: REST API (Recommended)¶
Start the service:
Access Swagger UI:
Open http://localhost:8004/docs in your browser
Try an analysis:
- Click on
/analyze/template - Click "Try it out"
- Use this example request:
{
"dataset_path": "data/splice_sites_enhanced.tsv",
"analysis_type": "high_alternative_splicing",
"model": "gpt-4o-mini"
}
- Click "Execute"
- Copy the generated code from the response
- Save it to a
.pyfile and run it!
Option 2: Python Library¶
from agentic_spliceai import create_dataset
from agentic_spliceai.splice_analysis import generate_analysis_insight
from openai import OpenAI
# Load dataset
dataset = create_dataset("data/splice_sites_enhanced.tsv")
# Generate analysis
client = OpenAI()
result = generate_analysis_insight(
dataset=dataset,
analysis_type="high_alternative_splicing",
client=client
)
# Save code
with open("analysis.py", "w") as f:
f.write(result["chart_code"])
# Execute
exec(result["chart_code"])
Option 3: Command-Line Tool¶
# Run quick start examples
python examples/quick_start.py
# Run full analysis suite
python -m agentic_spliceai.examples.analyze_splice_sites \
--data data/splice_sites_enhanced.tsv \
--analysis all \
--output-dir output/analyses
📊 Available Analyses¶
1. High Alternative Splicing¶
Identifies genes with the most splice sites (potential for alternative splicing)
2. Genomic Distribution¶
Visualizes splice site distribution across chromosomes
3. Exon Complexity¶
Analyzes transcript structure by exon count
4. Strand Bias¶
Analyzes strand distribution of splice sites
5. Transcript Diversity¶
Identifies genes with most transcript isoforms
🔬 Custom Research Questions¶
Ask your own questions:
from agentic_spliceai.splice_analysis import generate_exploratory_insight
result = generate_exploratory_insight(
dataset=dataset,
research_question="How do splice sites distribute by gene biotype?",
client=client
)
Example questions: - "What is the relationship between gene length and splice site density?" - "Which chromosomes have the highest alternative splicing rates?" - "How do donor and acceptor sites differ in their genomic distribution?"
📚 Nexus Research Agent (NEW)¶
Generate comprehensive research reports on splicing topics:
CLI Usage¶
# Basic research report
nexus "Alternative Splicing Mechanisms in Cancer"
# With PDF generation
nexus "SpliceAI Deep Learning Architecture" --pdf
# Comprehensive report
nexus "Splice Site Recognition by U1 snRNP" \
--model openai:gpt-4o \
--length comprehensive \
--pdf
# Quick literature review
nexus "Recent advances in splice site prediction" \
--model openai:gpt-4o-mini \
--length brief
Python API¶
from nexus.agents.research import ResearchAgent
from nexus.core.config import Config
# Initialize research agent
config = Config()
agent = ResearchAgent(config)
# Generate research report
result = agent.research(
topic="Splice Site Recognition by U1 snRNP",
length="standard",
generate_pdf=True
)
print(f"Report saved to: {result['output_path']}")
Web Interface¶
Use Cases¶
- Literature Review: Research latest splicing mechanisms before analysis
- Grant Proposals: Generate comprehensive background sections
- Method Validation: Validate analysis approaches with current research
- Stay Updated: Keep up with latest splice prediction methods
- Self-Improvement: Learn from research to enhance analysis methods
🌐 API Examples¶
List Available Analyses¶
Template Analysis¶
curl -X POST http://localhost:8004/analyze/template \
-H "Content-Type: application/json" \
-d '{
"dataset_path": "data/splice_sites_enhanced.tsv",
"analysis_type": "high_alternative_splicing",
"model": "gpt-4o-mini"
}'
Exploratory Analysis¶
curl -X POST http://localhost:8004/analyze/exploratory \
-H "Content-Type: application/json" \
-d '{
"dataset_path": "data/splice_sites_enhanced.tsv",
"research_question": "What is the distribution of splice sites across chromosomes?",
"model": "gpt-4o-mini"
}'
📁 Data Format¶
Your dataset should have these columns:
Required:
- chrom - Chromosome (chr1, chr2, ..., chrX, chrY)
- position - Genomic position
- site_type - donor or acceptor
- strand - + or -
Optional:
- gene_name - Gene symbol (TP53, BRCA1, etc.)
- transcript_id - Transcript identifier
- exon_rank - Exon number
Example:
chrom position site_type strand gene_name transcript_id exon_rank
chr1 12345 donor + TP53 NM_000546.6 5
chr1 12678 acceptor + TP53 NM_000546.6 6
🎓 Next Steps¶
- Try the examples - Run
python examples/quick_start.py - Explore the API - Open http://localhost:8004/docs
- Read the docs - See README.md for detailed information
- Customize analyses - Modify generated code to fit your needs
- Add new templates - Extend
splice_analysis.pywith your own analyses
🐛 Troubleshooting¶
API Key Not Found¶
# Make sure .env file exists and contains:
OPENAI_API_KEY=sk-your-actual-key-here
# Or export it:
export OPENAI_API_KEY=sk-your-actual-key-here
Dataset Not Found¶
# Check the path is relative to project root
# Example: data/splice_sites_enhanced.tsv
# NOT: /full/path/to/data/splice_sites_enhanced.tsv
Port Already in Use¶
# Change port in .env:
SPLICE_AGENT_PORT=8005
# Or in splice_service.py:
uvicorn.run(..., port=8005)
Import Errors¶
# Make sure you're in the right environment
mamba activate agentic-spliceai
# Reinstall dependencies
pip install -r requirements.txt
💡 Tips¶
- Start with templates - Use predefined analyses before custom questions
- Review generated code - Always check the code before executing
- Use appropriate models -
gpt-4o-minifor speed,gpt-4ofor quality - Cache datasets - The API caches loaded datasets for performance
- Batch processing - Generate multiple analyses at once for efficiency
📚 More Resources¶
- Full README - Complete documentation
- API Reference - Detailed API documentation
- Biology Background - Splice site biology primer
- Examples - More example scripts
Questions? Open an issue on GitHub or check the documentation!