MedFlow-AI: Hospital Queue Management RL Environment

A reinforcement learning environment for hospital resource management, integrating with Hugging Face LLM models for intelligent decision-making.

Overview

MedFlow-AI is a FastAPI-based RL environment where an LLM agent (powered by Qwen/Qwen2.5-7B-Instruct) learns to manage hospital queues by making optimal decisions about which patients to treat.

Key Features

✅ FastAPI Backend - RESTful API endpoints for environment interaction
✅ LLM Integration - Hugging Face Inference Router with OpenAI-compatible API
✅ Multiple Tasks - Emergency-focused, balanced load, and high-pressure scenarios
✅ Docker Support - Production-ready containerization
✅ OpenEnv Compliant - Follows RL environment standards
✅ Strict Logging - Structured output for analysis and debugging
https://anamika-haridas-medflow-ai.hf.space/state

Quick Start

Installation

Prerequisites:

Python 3.10+
Docker Desktop (optional)
Hugging Face API token

Local Setup:

# Clone repository
git clone https://github.com/NilinR/MedFlow-AI.git
cd MedFlow-AI

# Install dependencies
pip install -r requirements.txt

# Set Hugging Face token
$env:HF_TOKEN = "your_hf_token_here"

Run Locally

Start FastAPI server:

python -m uvicorn main:app --reload --port 8000

Run inference agent:

python inference.py

Validate project setup:

python validate.py

Docker Deployment

Build image:

docker build -t medflow-ai .

Run inference in container:

docker run -e HF_TOKEN=your_token medflow-ai

Run FastAPI server in container:

docker run -p 8000:8000 -e HF_TOKEN=your_token medflow-ai \
  python -m uvicorn main:app --host 0.0.0.0 --port 8000

API Endpoints

POST `/reset`

Reset the environment with optional task selection.

Parameters:

{
  "task": "default" | "emergency" | "balanced" | "pressure"
}

Response:

{
  "state": {
    "emergency_patients": 0,
    "patients_waiting": 2,
    "doctors_available": 1,
    "time_step": 0
  },
  "reward": 0.0,
  "done": false
}

POST `/step`

Execute an action in the environment.

Parameters:

{
  "action": 0 | 1 | 2
}

Actions:

0 → Treat normal patient (+10 reward)
1 → Treat emergency patient (+20 reward)
2 → Wait (-5 reward)

Response:

{
  "state": { ... },
  "reward": 20.0,
  "done": false
}

GET `/state`

Get current environment state.

Response:

{
  "state": { ... }
}

Project Structure

MedFlow-AI/
├── main.py                  # FastAPI server with endpoints
├── inference.py             # LLM-based agent inference
├── env.py                   # HospitalEnv RL environment class
├── tasks.py                 # Task scenario definitions
├── graders.py               # Evaluation/grading functions
├── openenv.yaml             # OpenEnv configuration
├── validate.py              # Project validation script
├── requirements.txt         # Python dependencies
├── Dockerfile               # Docker image definition
├── .dockerignore             # Docker build optimization
├── .gitignore               # Git ignore rules
├── .env                     # Environment variables
└── README.md                # This file

Configuration

Environment Variables (`.env`)

HF_TOKEN=your_hugging_face_token
API_BASE_URL=http://localhost:8000
MODEL_NAME=Qwen/Qwen2.5-7B-Instruct
MAX_STEPS=10

OpenEnv Configuration (`openenv.yaml`)

Defines state space, action space, reward bounds
Specifies task scenarios
Configures API endpoints
Sets LLM agent parameters

State Space

Field	Type	Description
`emergency_patients`	int	Number of patients needing emergency care
`patients_waiting`	int	Number of patients in normal queue
`doctors_available`	int	Number of available medical staff
`time_step`	int	Current timestep (max 10)

Reward Structure

Action	Condition	Reward
Treat Emergency	Emergency patients > 0	+20
Treat Normal	Normal patients > 0	+10
Treat Emergency	No emergencies (wrong choice)	-20
Wait	Always	-5

Tasks

1. Emergency Focus

Setup: High emergencies, low normal patients
Objective: Agent learns to prioritize emergency care
Initial State: 4 emergencies, 1 normal patient, 2 doctors

2. Balanced Load

Setup: Equal mix of emergency and normal patients
Objective: Balance between priority and throughput
Initial State: 3 emergencies, 3 normal patients, 3 doctors

3. High Pressure

Setup: Many patients, limited time (10 steps max)
Objective: Maximize efficiency under constraints
Initial State: 5 emergencies, 6 normal patients, 2 doctors

Inference Output Format

[START] task=hospital_llm model=Qwen/Qwen2.5-7B-Instruct
[INFO] HF token loaded successfully
[TEST_LLM] response=1
[INIT] state={"emergency_patients": 0, ...}
[STEP] step=1 action=1 state={...} reward=20.00 next_state={...} done=false
[STEP] step=2 action=0 state={...} reward=10.00 next_state={...} done=true
[END] success=true steps=2 score=30.00 rewards=20.00,10.00

Validation

Run the validation script to verify project setup:

python validate.py

Checks:

✅ OpenEnv configuration validity
✅ FastAPI endpoints present
✅ Environment initialization
✅ Required file presence
✅ Dependency availability
✅ Docker setup

Dependencies

Core

fastapi (0.104.1) - Web framework
uvicorn (0.24.0) - ASGI server
pydantic (2.5.0) - Data validation

LLM & ML

openai (1.7.0) - OpenAI client
httpx (0.25.0) - HTTP client
python-dotenv (1.0.0) - Environment configuration

RL & Validation

openenv-core (≥0.1.0) - OpenEnv framework
pyyaml (≥6.0) - YAML parsing

See requirements.txt for complete list.

Development

Running Tests

python validate.py

Extending the Environment

To add custom tasks, edit tasks.py:

def task_custom():
    return {
        "emergency_patients": X,
        "patients_waiting": Y,
        "doctors_available": Z,
        "time_step": 0
    }

Then reference in openenv.yaml and main.py.

Performance Example

Emergency Focus Task:

[STEP] step=1 action=1 reward=20.00 done=false (Treat emergency)
[STEP] step=2 action=1 reward=20.00 done=false (Treat emergency)
[STEP] step=3 action=1 reward=20.00 done=false (Treat emergency)
[STEP] step=4 action=1 reward=20.00 done=false (Treat emergency)
[STEP] step=5 action=0 reward=10.00 done=true  (Treat normal)
[END] success=true steps=5 score=100.00

The LLM correctly learns to prioritize emergency patients first!

Contributing

Fork the repository
Create a feature branch
Make your changes
Validate with python validate.py
Push and create a pull request

License

This project is part of MedFlow-AI research initiative.

References

OpenEnv Framework: RL environment standardization
Hugging Face Inference: https://huggingface.co/docs/inference-api
FastAPI: https://fastapi.tiangolo.com
Qwen Model: https://huggingface.co/Qwen/Qwen2.5-7B-Instruct

Support

For issues or questions:

Check validate.py for debugging
Review openenv.yaml for configuration
See inference.py for LLM integration details

Status: ✅ Production-ready | Last updated: April 12, 2026

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
env.py		env.py
graders.py		graders.py
inference.py		inference.py
main.py		main.py
openenv.yaml		openenv.yaml
requirements.txt		requirements.txt
tasks.py		tasks.py
validate.py		validate.py

Folders and files

Latest commit

History

Repository files navigation

MedFlow-AI: Hospital Queue Management RL Environment

Overview

Key Features

Quick Start

Installation

Run Locally

Docker Deployment

API Endpoints

POST /reset

POST /step

GET /state

Project Structure

Configuration

Environment Variables (.env)

OpenEnv Configuration (openenv.yaml)

State Space

Reward Structure

Tasks

1. Emergency Focus

2. Balanced Load

3. High Pressure

Inference Output Format

Validation

Dependencies

Core

LLM & ML

RL & Validation

Development

Running Tests

Extending the Environment

Performance Example

Contributing

License

References

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

POST `/reset`

POST `/step`

GET `/state`

Environment Variables (`.env`)

OpenEnv Configuration (`openenv.yaml`)

Packages