MediGraphX structures medical data into ontology-guided knowledge graphs for evidence-based reasoning and clinical decision support.
-
Clone & Install (Recommended setup)
git clone https://github.com/julka01/MediGraphRAG.git cd MediGraphRAG # Project directory curl -LsSf https://astral.sh/uv/install.sh | sh # Install uv uv sync # Create venv & install deps source .venv/bin/activate # Linux/Mac # OR: .venv\Scripts\activate # Windows
-
Start Neo4j (with Docker)
docker compose up -d neo4j # Access browser at http://localhost:7474 -
Configure (Create
.envwith your API keys)NEO4J_URI=bolt://localhost:7687 NEO4J_USER=neo4j NEO4J_PASSWORD=your-password OPENAI_API_KEY=sk-your-key # Or other provider key -
Run & Test
python start_server.py # Start app on http://localhost:8004 curl "http://localhost:8004/health/neo4j" # Verify connection
- Requirements & Setup
- Getting Started
- Features
- Architecture
- API Reference
- Configuration
- Troubleshooting
- Contributing
- License
System Requirements: Python 3.9+ (3.11+ recommended), 8GB RAM minimum (16GB+ for large datasets), 10GB storage minimum, 4 CPU cores minimum. Optional: NVIDIA GPU for acceleration.
- Neo4j 5.0+ : Graph database
- Docker 20.10+ : Containerized deployment
At least one: OPENAI_API_KEY | ANTHROPIC_API_KEY | GEMINI_API_KEY | OPENROUTER_API_KEY | OLLAMA_HOST
See Quick Start above for the recommended approach. For alternatives:
- Package Managers: For pip instead of uv, see pyproject.toml
- Neo4j Setup: Native installation available at neo4j.com
- Docker Alternatives: Standalone containers or Kubernetes via
docker-compose.yml
Create .env with:
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-password
OPENAI_API_KEY=sk-your-key # Or other provider keyOptions: OPENAI_API_KEY | ANTHROPIC_API_KEY | GEMINI_API_KEY | OPENROUTER_API_KEY | OLLAMA_HOST
Security: Never commit .env; obtain keys from provider dashboards.
Primary Option: Docker Compose
docker compose up -d # Full stack via docker-compose.ymlAlternatives:
- Kubernetes: Use included
docker-compose.ymlas base, deploy via Helm or K8s manifests - Standalone:
docker build -t medigraph . && docker run -p 8004:8004 --env-file .env medigraph - CI/CD: Build images in pipelines, deploy to cloud registries
Production Tips: Use strong passwords, enable monitoring (Sentry), configure TLS. See docker-compose.yml for full config.
Assuming setup complete (see Quick Start), try these core functions. All examples use {{BASE_URL}} = http://localhost:8004.
curl -X POST "{{BASE_URL}}/create_ontology_guided_kg" \
-F "file=@clinical_guidelines.pdf" \
-F "provider=openai" \
-F "model=gpt-4"curl -X POST "{{BASE_URL}}/chat" \
-H "Content-Type: application/json" \
-d '{"question": "Treatment options for stage II prostate cancer?", "document_names": ["guidelines.pdf"], "provider_rag": "openai", "model_rag": "gpt-4"}'curl -X POST "{{BASE_URL}}/bulk_process_csv" \
-F "csv_file=@patient_cohort.csv" \
-F "batch_size=50"Advanced: Use population-level queries or antimicrobial stewardship questions similarly.
- Ontology-Guided KGs: Structure medical data using biomedical ontologies (OWL, UMLS)
- Evidence-Based Reasoning: Retrieval-augmented generation with source attribution
- Multi-Modal Processing: Supports PDFs, CSVs, research documents
- Medical Q&A: Natural language queries with uncertainty quantification
- Scalable Architecture: Neo4j graph + vector embeddings for semantic search
- Batch Cohort Analysis: Process population-level datasets
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Medical │───▶│ LLM Parser │───▶│ Knowledge │
│ Documents │ │ + Ontology │ │ Graph Store │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Natural │───▶│ Vector Search │───▶│ Evidence App │
│ Language │ │ + Graph │ │ Citations │
│ Questions │ │ Traversal │ │ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
System Flow: Medical Documents → LLM + Ontology Parser → Knowledge Graph (Neo4j) → Vector Search + Graph Traversal → Evidence Citations.
Pipeline:
- Document ingestion & metadata
- Semantic chunking + ontology validation
- Graph construction & embeddings
- Query processing with confidence scoring
Key Specs:
- Chunking: 1000-4000 chars (semantic boundaries)
- Embedding Options: Sentence-BERT (384-dim), OpenAI embeddings, local models via Ollama
- Similarity: Cosine distance, configurable thresholds (0.08 default)
- Database: Neo4j 5.0+ with ChromaDB vector acceleration
- Latency: <5s for typical medical queries
| Endpoint | Method | Purpose | Authentication |
|---|---|---|---|
/create_ontology_guided_kg |
POST | Knowledge graph creation | None |
/chat |
POST | Medical Q&A | None |
/bulk_process_csv |
POST | Batch processing | None |
/health/neo4j |
GET | Database status | None |
/visualize_graph |
GET | Graph visualization | None |
Create knowledge graphs from documents.
curl -X POST "http://localhost:8004/create_ontology_guided_kg" \
-F "file=@medical_doc.pdf" \
-F "provider=openai" \
-F "model=gpt-4" \
-F "max_chunks=20"Parameters:
file: Document file (PDF, TXT, etc.)provider: LLM provider (openai, anthropic, gemini)model: Specific model namemax_chunks: Processing limit
Evidence-based question answering.
curl -X POST "http://localhost:8004/chat" \
-H "Content-Type: application/json" \
-d '{
"question": "Treatment options for diabetes?",
"document_names": ["guidelines.pdf"],
"provider_rag": "openai",
"model_rag": "gpt-4"
}'Response Format:
{
"recommendation_summary": "Evidence-based approach",
"node_traversal_path": ["Patient → Symptoms → Treatment"],
"reasoning_path": ["Clinical finding 1", "Evidence 2"],
"evidence_synthesis": "Combined analysis",
"confidence_metrics": {
"similarity_score": 0.85,
"entity_coverage": 88.5
},
"source_citations": ["Author A (2023) pg. 45"]
}- All endpoints currently open (development mode)
- Production deployments should implement:
- API key authentication
- Rate limiting
- Input sanitization
- Audit logging
# Database connection
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=secure-password
# LLM providers (at least one)
OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...
# GEMINI_API_KEY=...# Document processing
CHUNK_SIZE=2000 # Character chunks (1000-4000)
CHUNK_OVERLAP=300 # Overlap between chunks
MAX_CHUNKS=50 # Processing limit per document
# Vector embeddings & search
EMBEDDING_MODEL=sentence_transformers # Options: sentence_transformers, openai, local
VECTOR_SIMILARITY_THRESHOLD=0.08 # Relevance threshold (0.0-1.0)
EMBEDDING_BATCH_SIZE=32 # Batch size for embedding generation
# LLM configurations (provider-specific)
OPENAI_API_KEY=sk-your-key # Required for OpenAI models
ANTHROPIC_API_KEY=sk-ant-your-key # Required for Anthropic models
OLLAMA_HOST=http://localhost:11434 # Required for local Ollama models
OLLAMA_MODEL=mistral:7b # Local model name# Logging and monitoring
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR
LOG_FILE=logs/medigraph.log
SENTRY_DSN=https://your-sentry-dsn
# Performance tuning
MAX_WORKERS=4 # Concurrent processing
CACHE_TTL=3600 # Cache expiration (seconds)
MEMORY_LIMIT=8GB # Process memory limit
# Security
ALLOW_ORIGINS=http://localhost:3000,https://yourapp.com
CORS_ORIGINS=* # In production, specify explicitly.env: Environment variablesconfig.yaml: Advanced system configurationlogging.conf: Logging configuration
Symptom: "Unable to connect to Neo4j database"
Solutions:
# Check if Neo4j is running
docker ps | grep neo4j
# Verify credentials
curl http://localhost:7474 -u neo4j:your-password
# Reset database
docker compose down neo4j
docker volume rm medigraph_neo4j_data
docker compose up -d neo4jSymptom: "API rate limit exceeded" or "Authentication failed"
Solutions:
# Test API key validity
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
https://api.openai.com/v1/models
# Switch providers for failover
export ANTHROPIC_API_KEY=sk-ant-your-key
# Check usage quotas in provider dashboardSymptom: "Out of memory" during large document processing
Solutions:
# Reduce chunk size
export CHUNK_SIZE=1500
# Process in smaller batches
curl -F "file=@large_doc.pdf" \
-F "max_chunks=10" \
http://localhost:8004/create_ontology_guided_kg
# Monitor resource usage
docker statsSymptom: Slow response times
Solutions:
# Enable GPU acceleration (if available)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Increase workers for parallel processing
export MAX_WORKERS=8
# Optimize Neo4j memory
# Add JVM options to neo4j.conf
dbms.memory.heap.initial_size=2G
dbms.memory.heap.max_size=4G# Health checks
curl "http://localhost:8004/health/neo4j"
# Log analysis
tail -f logs/medigraph.log
# Database queries
cypher-shell -u neo4j -p password
MATCH (n) RETURN count(n) as node_count;
# Performance monitoring
docker stats
htop # or topSteps:
- Review
logs/medigraph.logfor details - Run health checks (see Diagnostic Tools)
- Verify environment config vs docs
- Submit issues with full logs & system info at GitHub Issues
We welcome contributions from healthcare professionals, data scientists, and open source enthusiasts!
git clone https://github.com/YOUR-USERNAME/MediGraphRAG.git
cd MediGraphRAGpython -m venv venv
source venv/bin/activate
# Install development dependencies
pip install -r requirements.txt
# Set up pre-commit hooks
pre-commit install# Test suite
pytest tests/ -v --cov=src/
# Lint code
black src/ tests/
flake8 src/ tests/
# Type checking
mypy src/ --strictCreate a feature branch:
git checkout -b feature/your-feature-name
# Make changes...
# Write tests...
# Update documentation...
# Commit changes
git add .
git commit -m "Add feature: your descriptive message"
# Push commits
git push origin feature/your-feature-name- Create PR with comprehensive description
- Ensure all tests pass
- Update documentation if needed
- Python Version: 3.11+ compatible
- Style: Black for formatting
- Linting: flake8 + mypy
- Testing: pytest with >= 80% coverage
MediGraph is released under the MIT License. See LICENSE for full details.
- Regulatory Compliance: Users are responsible for ensuring compliance with applicable healthcare regulations (HIPAA, GDPR, etc.)
- Clinical Use: This software is for research and educational purposes. Clinical decisions should be made by qualified healthcare professionals
- Data Privacy: Implement appropriate data protection measures when handling patient information
MediGraph v1.0.0-rc1: Advancing clinical decision-making through transparent, evidence-based AI systems.
For support: GitHub Issues