Skip to content

NilinR/MedFlow-AI

Repository files navigation

MedFlow-AI: Hospital Queue Management RL Environment

A reinforcement learning environment for hospital resource management, integrating with Hugging Face LLM models for intelligent decision-making.

Overview

MedFlow-AI is a FastAPI-based RL environment where an LLM agent (powered by Qwen/Qwen2.5-7B-Instruct) learns to manage hospital queues by making optimal decisions about which patients to treat.

Key Features

FastAPI Backend - RESTful API endpoints for environment interaction
LLM Integration - Hugging Face Inference Router with OpenAI-compatible API
Multiple Tasks - Emergency-focused, balanced load, and high-pressure scenarios
Docker Support - Production-ready containerization
OpenEnv Compliant - Follows RL environment standards
Strict Logging - Structured output for analysis and debugging
https://anamika-haridas-medflow-ai.hf.space/state image


Quick Start

Installation

Prerequisites:

  • Python 3.10+
  • Docker Desktop (optional)
  • Hugging Face API token

Local Setup:

# Clone repository
git clone https://github.com/NilinR/MedFlow-AI.git
cd MedFlow-AI

# Install dependencies
pip install -r requirements.txt

# Set Hugging Face token
$env:HF_TOKEN = "your_hf_token_here"

Run Locally

Start FastAPI server:

python -m uvicorn main:app --reload --port 8000

Run inference agent:

python inference.py

Validate project setup:

python validate.py

Docker Deployment

Build image:

docker build -t medflow-ai .

Run inference in container:

docker run -e HF_TOKEN=your_token medflow-ai

Run FastAPI server in container:

docker run -p 8000:8000 -e HF_TOKEN=your_token medflow-ai \
  python -m uvicorn main:app --host 0.0.0.0 --port 8000

API Endpoints

POST /reset

Reset the environment with optional task selection.

Parameters:

{
  "task": "default" | "emergency" | "balanced" | "pressure"
}

Response:

{
  "state": {
    "emergency_patients": 0,
    "patients_waiting": 2,
    "doctors_available": 1,
    "time_step": 0
  },
  "reward": 0.0,
  "done": false
}

POST /step

Execute an action in the environment.

Parameters:

{
  "action": 0 | 1 | 2
}

Actions:

  • 0 → Treat normal patient (+10 reward)
  • 1 → Treat emergency patient (+20 reward)
  • 2 → Wait (-5 reward)

Response:

{
  "state": { ... },
  "reward": 20.0,
  "done": false
}

GET /state

Get current environment state.

Response:

{
  "state": { ... }
}

Project Structure

MedFlow-AI/
├── main.py                  # FastAPI server with endpoints
├── inference.py             # LLM-based agent inference
├── env.py                   # HospitalEnv RL environment class
├── tasks.py                 # Task scenario definitions
├── graders.py               # Evaluation/grading functions
├── openenv.yaml             # OpenEnv configuration
├── validate.py              # Project validation script
├── requirements.txt         # Python dependencies
├── Dockerfile               # Docker image definition
├── .dockerignore             # Docker build optimization
├── .gitignore               # Git ignore rules
├── .env                     # Environment variables
└── README.md                # This file

Configuration

Environment Variables (.env)

HF_TOKEN=your_hugging_face_token
API_BASE_URL=http://localhost:8000
MODEL_NAME=Qwen/Qwen2.5-7B-Instruct
MAX_STEPS=10

OpenEnv Configuration (openenv.yaml)

  • Defines state space, action space, reward bounds
  • Specifies task scenarios
  • Configures API endpoints
  • Sets LLM agent parameters

State Space

Field Type Description
emergency_patients int Number of patients needing emergency care
patients_waiting int Number of patients in normal queue
doctors_available int Number of available medical staff
time_step int Current timestep (max 10)

Reward Structure

Action Condition Reward
Treat Emergency Emergency patients > 0 +20
Treat Normal Normal patients > 0 +10
Treat Emergency No emergencies (wrong choice) -20
Wait Always -5

Tasks

1. Emergency Focus

  • Setup: High emergencies, low normal patients
  • Objective: Agent learns to prioritize emergency care
  • Initial State: 4 emergencies, 1 normal patient, 2 doctors

2. Balanced Load

  • Setup: Equal mix of emergency and normal patients
  • Objective: Balance between priority and throughput
  • Initial State: 3 emergencies, 3 normal patients, 3 doctors

3. High Pressure

  • Setup: Many patients, limited time (10 steps max)
  • Objective: Maximize efficiency under constraints
  • Initial State: 5 emergencies, 6 normal patients, 2 doctors

Inference Output Format

[START] task=hospital_llm model=Qwen/Qwen2.5-7B-Instruct
[INFO] HF token loaded successfully
[TEST_LLM] response=1
[INIT] state={"emergency_patients": 0, ...}
[STEP] step=1 action=1 state={...} reward=20.00 next_state={...} done=false
[STEP] step=2 action=0 state={...} reward=10.00 next_state={...} done=true
[END] success=true steps=2 score=30.00 rewards=20.00,10.00

Validation

Run the validation script to verify project setup:

python validate.py

Checks:

  • ✅ OpenEnv configuration validity
  • ✅ FastAPI endpoints present
  • ✅ Environment initialization
  • ✅ Required file presence
  • ✅ Dependency availability
  • ✅ Docker setup

Dependencies

Core

  • fastapi (0.104.1) - Web framework
  • uvicorn (0.24.0) - ASGI server
  • pydantic (2.5.0) - Data validation

LLM & ML

  • openai (1.7.0) - OpenAI client
  • httpx (0.25.0) - HTTP client
  • python-dotenv (1.0.0) - Environment configuration

RL & Validation

  • openenv-core (≥0.1.0) - OpenEnv framework
  • pyyaml (≥6.0) - YAML parsing

See requirements.txt for complete list.


Development

Running Tests

python validate.py

Extending the Environment

To add custom tasks, edit tasks.py:

def task_custom():
    return {
        "emergency_patients": X,
        "patients_waiting": Y,
        "doctors_available": Z,
        "time_step": 0
    }

Then reference in openenv.yaml and main.py.


Performance Example

Emergency Focus Task:

[STEP] step=1 action=1 reward=20.00 done=false (Treat emergency)
[STEP] step=2 action=1 reward=20.00 done=false (Treat emergency)
[STEP] step=3 action=1 reward=20.00 done=false (Treat emergency)
[STEP] step=4 action=1 reward=20.00 done=false (Treat emergency)
[STEP] step=5 action=0 reward=10.00 done=true  (Treat normal)
[END] success=true steps=5 score=100.00

The LLM correctly learns to prioritize emergency patients first!


Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Validate with python validate.py
  5. Push and create a pull request

License

This project is part of MedFlow-AI research initiative.


References


Support

For issues or questions:


Status: ✅ Production-ready | Last updated: April 12, 2026

About

OpenEnv based reinforcement learning environment for hospital resource allocation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors