Email Triage OpenEnv

A complete OpenEnv-compliant environment for training AI agents to manage a business email inbox. The agent learns to triage, label, reply, route, and escalate emails — a task humans perform every working day.

Motivation

Email triage is one of the most universal productivity bottlenecks in knowledge work. A skilled human triager must:

Identify urgent issues (production outages, angry customers, security incidents)
Ignore noise (newsletters, spam)
Craft appropriate replies
Route messages to the right team
Escalate issues that require human decision-making

This environment lets an AI agent learn these skills from structured reward signals.

Environment Description

The agent receives an inbox of realistic business emails and must process them through a sequence of actions. Three tasks of increasing difficulty are provided, each scored on a deterministic rubric (0.0 – 1.0).

Action Space

Field	Type	Description
`action_type`	enum	`label` / `reply` / `archive` / `escalate` / `route` / `noop`
`email_id`	string	ID of the target email
`label`	enum	`urgent` / `high` / `normal` / `low` / `spam`
`reply_text`	string	Draft reply (min 20 chars for reward)
`route_to`	string	Department: `devops` / `legal` / `engineering` / `management`
`reason`	string	Optional explanation

Observation Space

Field	Type	Description
`task_id`	string	Current task identifier
`step`	int	Steps taken so far
`emails`	list[Email]	Full inbox with sender, subject, body, timestamp
`inbox_size`	int	Number of emails
`done`	bool	Episode completion flag
`message`	string	Human-readable status
`metadata`	dict	max_steps, actions_taken

Reward Structure

Rewards are provided at every step (intermediate signals) and a final grade is computed at episode end.

Action	Reward
Correct urgent label	+0.05
Correct non-urgent label	+0.02
Wrong label	−0.02
Good reply (needed)	+0.04
Unnecessary reply	−0.03
Correct route	+0.05
Correct escalation	+0.05
Unnecessary escalation	−0.03
noop	−0.02

Tasks

Task 1 — Priority Labeling (Easy)

Objective: Label 5 emails with the correct priority level.

Scoring: Weighted label accuracy. Urgent emails count 2×.

Expected score for a good agent: 0.75 – 1.0

Max steps: 10

Task 2 — Triage and Reply (Medium)

Objective: Label 10 emails and draft replies to emails that require a response.

Scoring: 0.6 × label_accuracy + 0.4 × reply_coverage − 0.05 × unnecessary_replies

Expected score for a good agent: 0.55 – 0.80

Max steps: 25

Task 3 — Full Inbox Management (Hard)

Objective: Process 15 emails — label, reply, route to the correct department, and escalate compliance/critical issues.

Scoring: 0.40 × label + 0.35 × routing + 0.25 × escalation + reply_bonus

Critical emails (regulatory notices, partner outages) carry 3× label weight.

Expected score for a good agent: 0.40 – 0.70

Max steps: 40

Baseline Scores

Measured with meta-llama/Llama-3.3-70B-Instruct via HuggingFace Inference API:

Task	Baseline Score
task1	~0.70
task2	~0.52
task3	~0.41
Average	~0.54

Setup and Usage

Local Setup

git clone https://huggingface.co/spaces/<your-username>/email-triage-openenv
cd email-triage-openenv

pip install -r requirements.txt

uvicorn main:app --host 0.0.0.0 --port 7860 --reload

API will be available at http://localhost:7860.

Docker

docker build -t email-triage-openenv .
docker run -p 7860:7860 \
  -e API_BASE_URL=https://router.huggingface.co/v1 \
  -e MODEL_NAME=meta-llama/Llama-3.3-70B-Instruct \
  -e HF_TOKEN=hf_your_token \
  email-triage-openenv

Running the Baseline Inference Script

export API_BASE_URL=https://router.huggingface.co/v1
export MODEL_NAME=meta-llama/Llama-3.3-70B-Instruct
export HF_TOKEN=hf_your_token_here

python inference.py

Quick API Usage

import requests

BASE = "http://localhost:7860"

# Start task 1
obs = requests.post(f"{BASE}/reset", json={"task_id": "task1"}).json()

# Take an action
result = requests.post(f"{BASE}/step", json={
    "action": {
        "action_type": "label",
        "email_id": "t1_e1",
        "label": "urgent"
    }
}).json()
print(result["reward"])

# Get final score
grade = requests.post(f"{BASE}/grade").json()
print(grade["final_score"])

API Reference

Method	Endpoint	Description
GET	`/health`	Health check — returns 200
GET	`/`	Environment info
POST	`/reset`	Start new episode `{"task_id": "task1"}`
POST	`/step`	Take action `{"action": {...}}`
GET	`/state`	Current environment state
GET	`/tasks`	List all tasks with metadata
POST	`/grade`	Run grader, get final score

Environment Variables

Variable	Description	Default
`API_BASE_URL`	LLM API endpoint	`https://router.huggingface.co/v1`
`MODEL_NAME`	Model identifier	`meta-llama/Llama-3.3-70B-Instruct`
`HF_TOKEN`	HuggingFace API key	—

Project Structure

email-triage-openenv/
├── main.py              # FastAPI server
├── inference.py         # Baseline agent (OpenAI client)
├── openenv.yaml         # OpenEnv spec metadata
├── requirements.txt
├── Dockerfile
├── README.md
└── env/
    ├── __init__.py
    ├── email_env.py     # EmailTriageEnv class (step/reset/state)
    ├── models.py        # Pydantic models (Observation, Action, Reward)
    ├── graders.py       # Task graders (deterministic scoring)
    └── data.py          # Email dataset (15 realistic emails)

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Email Triage OpenEnv

Motivation

Environment Description

Action Space

Observation Space

Reward Structure

Tasks

Task 1 — Priority Labeling (Easy)

Task 2 — Triage and Reply (Medium)

Task 3 — Full Inbox Management (Hard)

Baseline Scores

Setup and Usage

Local Setup

Docker

Running the Baseline Inference Script

Quick API Usage

API Reference

Environment Variables

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
env		env
Dockerfile		Dockerfile
README.md		README.md
inference.py		inference.py
main.py		main.py
openenv.yaml		openenv.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Email Triage OpenEnv

Motivation

Environment Description

Action Space

Observation Space

Reward Structure

Tasks

Task 1 — Priority Labeling (Easy)

Task 2 — Triage and Reply (Medium)

Task 3 — Full Inbox Management (Hard)

Baseline Scores

Setup and Usage

Local Setup

Docker

Running the Baseline Inference Script

Quick API Usage

API Reference

Environment Variables

Project Structure

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages