AI-powered prescription intelligence system — upload a handwritten or printed prescription image and get structured medication data, plain-English interpretation, and risk warnings in seconds.
- Overview
- Features
- Tech Stack
- System Architecture
- Processing Pipeline
- Frontend
- API Reference
- Project Structure
- Setup & Running
- Design Decisions
- Known Limitations
- Future Roadmap
PrescriptoAI transforms medical prescriptions into structured, actionable insights using a multi-agent AI pipeline. Instead of a single LLM call, it models prescription understanding as a stateful workflow with decision gates, retry logic, parallel analysis, and severity-based routing — all orchestrated by LangGraph.
The React frontend provides a clean interface for uploading prescriptions, viewing results, and asking follow-up questions via a built-in chat assistant.
Disclaimer: This is an assistive tool only. It does not provide medical diagnoses or clinical decisions.
- Vision OCR — Gemini Flash reads handwritten and printed prescriptions directly from images
- Confidence-aware retries — low-confidence OCR results trigger a targeted re-extraction attempt
- Structured extraction — medications, dosage, frequency, duration, route, and doctor instructions parsed into validated JSON
- Plain-English interpretation — medication purposes and dosing schedules explained in accessible language
- Risk assessment — flags missing dosages, ambiguous instructions, and low-confidence OCR areas with severity levels (high / medium / low)
- Parallel analysis — interpretation and risk detection run concurrently for faster response
- Severity routing — high-risk prescriptions trigger critical alert escalation; medium-risk produces advisory warnings
- Chat Q&A — multi-turn conversation about any analysed prescription, with full conversation memory per session
- Drag-and-drop upload — drop an image or click to browse; live preview before submission
- Text input tab — paste raw prescription text to skip OCR and go straight to analysis
- Medication cards — each medication displayed with its structured data and AI interpretation side by side
- Color-coded risk panel — warnings and alerts rendered with severity badges (red / yellow / green)
- Processing timeline — step-by-step log of what the pipeline did, great for understanding the AI's reasoning
- Chat interface — ask follow-up questions like "What is this medication for?" or "When should I take it?" with streaming-style responses
| Layer | Technology |
|---|---|
| Backend | Python, FastAPI |
| LLM | Google Gemini 2.5 Flash (free tier) |
| LLM Framework | LangChain (langchain-google-genai) |
| Agent Orchestration | LangGraph (stateful graph with cycles, conditional edges, parallel branches) |
| Structured Output | Pydantic v2 + .with_structured_output() |
| Settings | pydantic-settings |
| Image Handling | Pillow |
| Frontend | React + Vite |
| HTTP Client | Axios |
| Chat Memory | LangGraph MemorySaver (in-memory, per thread_id) |
┌─────────────────────────────────────────────────────────────┐
│ React Frontend │
│ UploadForm │ ResultDisplay │ MedicationCard │ ChatInterface │
└──────────────────────┬──────────────────────────────────────┘
│ HTTP (Axios)
┌──────────────────────▼──────────────────────────────────────┐
│ FastAPI Backend │
│ │
│ POST /upload-prescription POST /analyze POST /chat │
│ │ │ │ │
│ └──────────┬───────────┘ LangGraph │
│ │ chat_graph │
│ ┌────────▼────────┐ │
│ │ LangGraph │ │
│ │ StateGraph │ │
│ │ (13 nodes) │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Gemini Flash │ ← vision + text LLM │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────┘
[Image upload] [Text input]
│ │
▼ │
OCR Node ◄── retry ──┐ │
│ │ │
[confidence check] │ │
│ │ │
low → Enhance OCR ─────┘ │
ok → Clean Text ◄─────────┘
│
Structure Node ◄── retry ──┐
│ │
[completeness check] │
│ │
gaps → Recover Fields ──────┘
ok → Dispatch Analysis
│
┌────────┴────────┐
│ PARALLEL │
Reasoning Node Risk Node
│ │
└────────┬────────┘
Merge
│
[severity routing]
│
high → Critical Alert
med → Advisory
low → Format Output
│
END
Each stage produces structured, typed output. If any node fails, the pipeline returns a graceful fallback with error details rather than crashing.
The React frontend is built with Vite and communicates with the FastAPI backend via Axios.
Components:
| Component | Purpose |
|---|---|
UploadForm |
Drag-and-drop image upload with live preview; text input tab for raw paste |
ResultDisplay |
Orchestrates the full results layout |
MedicationCard |
Displays one medication with its structured fields and AI interpretation |
WarningsPanel |
Color-coded risk level badge and warning/alert list |
ChatInterface |
Multi-turn chat tied to a session thread_id; sends prescription context to ground answers |
Running the frontend:
cd frontend
npm install
npm run dev # dev server at http://localhost:5173
npm run build # production build → dist/Upload a prescription image. Runs the full pipeline including OCR.
Request: multipart/form-data
file: <image> # JPEG, PNG, WEBP, HEIC
Response:
{
"structured_data": {
"patient_name": "John Doe",
"doctor_name": "Dr. Smith",
"date": "2026-03-22",
"medications": [
{
"name": "Amoxicillin",
"dosage": "500mg",
"frequency": "TDS",
"duration": "7 days",
"route": "oral",
"instructions": "Take with food"
}
],
"notes": "Follow up in 1 week",
"missing_fields": []
},
"interpretation": {
"medication_interpretations": [...],
"instruction_summary": "Take Amoxicillin three times daily with food for one week.",
"abbreviations_decoded": { "TDS": "Three times daily" },
"disclaimer": "..."
},
"risk_assessment": { "risk_level": "low", "flags": [], "warnings": [] },
"risk_level": "low",
"warnings": [],
"critical_alerts": [],
"ocr_confidence": 0.91,
"processing_steps": ["[OCR] ...", "[Structuring] ...", "..."],
"errors": []
}Analyze raw prescription text (no image needed — skips OCR).
Request:
{ "text": "Amoxicillin 500mg TDS x 7 days\nTake with food\nDr. Smith" }Response: Same schema as /upload-prescription (without ocr_confidence).
Multi-turn Q&A about an analysed prescription.
Request:
{
"message": "What is Amoxicillin commonly used for?",
"thread_id": "session-abc123",
"prescription_context": {}
}Response:
{
"reply": "Amoxicillin is commonly prescribed for bacterial infections...",
"thread_id": "session-abc123"
}PrescriptoAI/
├── backend/
│ ├── main.py # FastAPI app, CORS, router registration
│ ├── config.py # Settings from .env (pydantic-settings)
│ ├── api/
│ │ └── routes/
│ │ ├── prescription.py # /upload-prescription, /analyze
│ │ └── chat.py # /chat
│ ├── pipeline/
│ │ ├── state.py # PrescriptionState TypedDict
│ │ ├── graph.py # Graph assembly and compile
│ │ ├── edges.py # Conditional edge functions
│ │ └── nodes/
│ │ ├── ocr_node.py
│ │ ├── ocr_enhancement_node.py
│ │ ├── flag_unreadable_node.py
│ │ ├── cleaning_node.py
│ │ ├── structuring_node.py
│ │ ├── field_recovery_node.py
│ │ ├── dispatch_analysis_node.py
│ │ ├── reasoning_node.py
│ │ ├── risk_node.py
│ │ ├── merge_node.py
│ │ ├── critical_alert_node.py
│ │ ├── advisory_node.py
│ │ └── output_formatter_node.py
│ ├── schemas/
│ │ └── models.py # Pydantic models (LLM output + API I/O)
│ └── utils/
│ └── prompts.py # LangChain ChatPromptTemplates
│
├── frontend/
│ ├── src/
│ │ ├── App.jsx # Root layout
│ │ ├── api.js # Axios API calls
│ │ └── components/
│ │ ├── UploadForm.jsx
│ │ ├── ResultDisplay.jsx
│ │ ├── MedicationCard.jsx
│ │ ├── WarningsPanel.jsx
│ │ └── ChatInterface.jsx
│ ├── package.json
│ └── vite.config.js
│
├── .env # GEMINI_API_KEY (not committed)
├── .env.example
├── requirements.txt
└── README.md
- Python 3.11+
- Node.js 18+
- A free Google AI Studio API key
# Create and activate virtual environment
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # macOS/Linux
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env and set GEMINI_API_KEY=your_actual_key
# Start the backend
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000API docs available at http://localhost:8000/docs
cd frontend
npm install
npm run dev # runs at http://localhost:5173The frontend proxies API requests to http://localhost:8000 — both must be running for full functionality.
Multi-agent pipeline over a single prompt — breaking processing into discrete nodes (OCR → clean → structure → reason → risk) makes each step independently debuggable, retryable, and replaceable without touching the rest of the pipeline.
Gemini Flash for OCR — instead of a separate OCR library (Tesseract, etc.), Gemini's vision capability handles both extraction and initial interpretation in one call, with self-reported confidence that drives the retry logic.
Structured output everywhere — .with_structured_output() with Pydantic models eliminates fragile JSON parsing. Every LLM response is type-validated before entering state.
Graceful degradation — every node wraps its LLM call in try/except and returns a safe fallback with descriptive errors/warnings. The API never crashes with a 500; it always returns a usable (if partial) response.
Parallel analysis — reasoning and risk detection are independent operations, so they run as concurrent LangGraph branches and merge before severity routing. This cuts latency roughly in half for the analysis phase.
React + Vite frontend — Vite's fast HMR and component-based architecture pair well with the structured JSON responses from the backend, making it straightforward to map API fields to UI components.
| Limitation | Detail |
|---|---|
| OCR on very degraded images | Even with retries, severely damaged or blurry images may not extract reliably |
| No drug database | Medication interpretations come from Gemini's training data, not a validated drug database |
| LLM hallucination risk | Structured output reduces but does not eliminate hallucination |
| No persistent storage | Results are not stored between sessions |
| English-optimised | Non-English prescriptions may have reduced accuracy |
| Free-tier rate limits | Gemini free tier has per-minute request limits |
Reliability
- OpenFDA API integration for validated medication lookup
- Per-field confidence scoring (not just overall OCR confidence)
- Cross-field consistency validation (e.g., flag impossible dosages)
Production
- PostgreSQL persistence for prescriptions, results, and chat history
- User authentication (JWT)
- Containerisation (Docker + docker-compose)
- Deployment: backend on Railway/Render, frontend on Vercel
Intelligence
- Multi-language prescription support
- Drug interaction detection across medications on the same prescription
- Pharmacist feedback loop to improve prompts over time