A reinforcement learning environment for hospital resource management, integrating with Hugging Face LLM models for intelligent decision-making.
MedFlow-AI is a FastAPI-based RL environment where an LLM agent (powered by Qwen/Qwen2.5-7B-Instruct) learns to manage hospital queues by making optimal decisions about which patients to treat.
✅ FastAPI Backend - RESTful API endpoints for environment interaction
✅ LLM Integration - Hugging Face Inference Router with OpenAI-compatible API
✅ Multiple Tasks - Emergency-focused, balanced load, and high-pressure scenarios
✅ Docker Support - Production-ready containerization
✅ OpenEnv Compliant - Follows RL environment standards
✅ Strict Logging - Structured output for analysis and debugging
https://anamika-haridas-medflow-ai.hf.space/state

Prerequisites:
- Python 3.10+
- Docker Desktop (optional)
- Hugging Face API token
Local Setup:
# Clone repository
git clone https://github.com/NilinR/MedFlow-AI.git
cd MedFlow-AI
# Install dependencies
pip install -r requirements.txt
# Set Hugging Face token
$env:HF_TOKEN = "your_hf_token_here"Start FastAPI server:
python -m uvicorn main:app --reload --port 8000Run inference agent:
python inference.pyValidate project setup:
python validate.pyBuild image:
docker build -t medflow-ai .Run inference in container:
docker run -e HF_TOKEN=your_token medflow-aiRun FastAPI server in container:
docker run -p 8000:8000 -e HF_TOKEN=your_token medflow-ai \
python -m uvicorn main:app --host 0.0.0.0 --port 8000Reset the environment with optional task selection.
Parameters:
{
"task": "default" | "emergency" | "balanced" | "pressure"
}Response:
{
"state": {
"emergency_patients": 0,
"patients_waiting": 2,
"doctors_available": 1,
"time_step": 0
},
"reward": 0.0,
"done": false
}Execute an action in the environment.
Parameters:
{
"action": 0 | 1 | 2
}Actions:
0→ Treat normal patient (+10 reward)1→ Treat emergency patient (+20 reward)2→ Wait (-5 reward)
Response:
{
"state": { ... },
"reward": 20.0,
"done": false
}Get current environment state.
Response:
{
"state": { ... }
}MedFlow-AI/
├── main.py # FastAPI server with endpoints
├── inference.py # LLM-based agent inference
├── env.py # HospitalEnv RL environment class
├── tasks.py # Task scenario definitions
├── graders.py # Evaluation/grading functions
├── openenv.yaml # OpenEnv configuration
├── validate.py # Project validation script
├── requirements.txt # Python dependencies
├── Dockerfile # Docker image definition
├── .dockerignore # Docker build optimization
├── .gitignore # Git ignore rules
├── .env # Environment variables
└── README.md # This file
HF_TOKEN=your_hugging_face_token
API_BASE_URL=http://localhost:8000
MODEL_NAME=Qwen/Qwen2.5-7B-Instruct
MAX_STEPS=10- Defines state space, action space, reward bounds
- Specifies task scenarios
- Configures API endpoints
- Sets LLM agent parameters
| Field | Type | Description |
|---|---|---|
emergency_patients |
int | Number of patients needing emergency care |
patients_waiting |
int | Number of patients in normal queue |
doctors_available |
int | Number of available medical staff |
time_step |
int | Current timestep (max 10) |
| Action | Condition | Reward |
|---|---|---|
| Treat Emergency | Emergency patients > 0 | +20 |
| Treat Normal | Normal patients > 0 | +10 |
| Treat Emergency | No emergencies (wrong choice) | -20 |
| Wait | Always | -5 |
- Setup: High emergencies, low normal patients
- Objective: Agent learns to prioritize emergency care
- Initial State: 4 emergencies, 1 normal patient, 2 doctors
- Setup: Equal mix of emergency and normal patients
- Objective: Balance between priority and throughput
- Initial State: 3 emergencies, 3 normal patients, 3 doctors
- Setup: Many patients, limited time (10 steps max)
- Objective: Maximize efficiency under constraints
- Initial State: 5 emergencies, 6 normal patients, 2 doctors
[START] task=hospital_llm model=Qwen/Qwen2.5-7B-Instruct
[INFO] HF token loaded successfully
[TEST_LLM] response=1
[INIT] state={"emergency_patients": 0, ...}
[STEP] step=1 action=1 state={...} reward=20.00 next_state={...} done=false
[STEP] step=2 action=0 state={...} reward=10.00 next_state={...} done=true
[END] success=true steps=2 score=30.00 rewards=20.00,10.00
Run the validation script to verify project setup:
python validate.pyChecks:
- ✅ OpenEnv configuration validity
- ✅ FastAPI endpoints present
- ✅ Environment initialization
- ✅ Required file presence
- ✅ Dependency availability
- ✅ Docker setup
- fastapi (0.104.1) - Web framework
- uvicorn (0.24.0) - ASGI server
- pydantic (2.5.0) - Data validation
- openai (1.7.0) - OpenAI client
- httpx (0.25.0) - HTTP client
- python-dotenv (1.0.0) - Environment configuration
- openenv-core (≥0.1.0) - OpenEnv framework
- pyyaml (≥6.0) - YAML parsing
See requirements.txt for complete list.
python validate.pyTo add custom tasks, edit tasks.py:
def task_custom():
return {
"emergency_patients": X,
"patients_waiting": Y,
"doctors_available": Z,
"time_step": 0
}Then reference in openenv.yaml and main.py.
Emergency Focus Task:
[STEP] step=1 action=1 reward=20.00 done=false (Treat emergency)
[STEP] step=2 action=1 reward=20.00 done=false (Treat emergency)
[STEP] step=3 action=1 reward=20.00 done=false (Treat emergency)
[STEP] step=4 action=1 reward=20.00 done=false (Treat emergency)
[STEP] step=5 action=0 reward=10.00 done=true (Treat normal)
[END] success=true steps=5 score=100.00
The LLM correctly learns to prioritize emergency patients first!
- Fork the repository
- Create a feature branch
- Make your changes
- Validate with
python validate.py - Push and create a pull request
This project is part of MedFlow-AI research initiative.
- OpenEnv Framework: RL environment standardization
- Hugging Face Inference: https://huggingface.co/docs/inference-api
- FastAPI: https://fastapi.tiangolo.com
- Qwen Model: https://huggingface.co/Qwen/Qwen2.5-7B-Instruct
For issues or questions:
- Check validate.py for debugging
- Review openenv.yaml for configuration
- See inference.py for LLM integration details
Status: ✅ Production-ready | Last updated: April 12, 2026