This project benchmarks online dispatch policies for a synthetic B2B support operation where tickets differ by queue, language, effort, and SLA priority, and agents differ by permissions, coverage, daily capacity, and shifts. The benchmark asks a practical operations question: which dispatch policy gives the best trade-off between throughput and protecting urgent tickets?
Three policies are compared on the same replay environment: a myopic greedy
baseline, a same-day look-ahead greedy heuristic, and a rolling CP-SAT OR
scheduler. In the current 5-minute setup, lookahead_greedy achieves the
highest throughput, greedy_baseline is slightly best on the narrowest P1
first-response metric, and or_scheduler improves overall backlog,
utilization, and total first-response tardiness versus greedy while keeping
urgent-ticket performance close. The benchmark is intentionally
overloaded: raw demand is 19,439 minutes against 19,200 minutes of 5-day
agent capacity, and the active 5-minute slotted workload rises to 20,460
minutes.
For a deeper look at the OR approach, start with Problem Overview, Rolling Policy, and Objective in the problem formulation.
- This project turns a realistic operations problem into a reproducible OR benchmark: overloaded support demand, limited agent supply, and competing SLA priorities.
- It compares simple online heuristics against a rolling CP-SAT scheduler in the same replay environment, so the trade-offs are easy to explain and verify.
- The key lesson is that maximizing throughput is not the same as protecting the most urgent tickets. In this benchmark, the OR scheduler is useful because it makes that business trade-off explicit instead of relying on raw volume alone.
- As a first OR portfolio project, the repo is designed to show the full loop: problem framing, synthetic data design, heuristic baselines, optimization modeling, and reproducible evaluation.
- 5-minute slots are the practical default: they preserve most of the timing detail without turning the benchmark into a one-minute dispatch simulation.
lookahead_greedymaximizes scheduled volume and overall utilization.or_scheduleris the stronger optimization-based trade-off against greedy, even though it is not the winner on every urgent-ticket metric.
Not all tickets can be scheduled within the available agent capacity, so policy quality should be judged by how well urgent work is protected under overload.
| Policy | Scheduled | Backlog | Total First Response Tardiness (min) | P1 First Response Tardiness (min) | P2 First Response Tardiness (min) | Utilization | Interpretation |
|---|---|---|---|---|---|---|---|
| Greedy | 442 | 69 | 186,591.88 | 175.74 | 3376.41 | 93.57% | Strongest narrow P1 handling, but weaker overall |
| Look-Ahead Greedy | 476 | 35 | 98,752.77 | 15644.30 | 34697.51 | 97.66% | Throughput winner, but clearly weaker on urgent-ticket protection |
| OR Scheduler | 452 | 59 | 131,681.71 | 268.99 | 3240.13 | 94.17% | Better overall compromise than greedy, without winning every urgent metric |
Resolution tardiness is secondary in the current benchmark: greedy and OR both finish at 0.0 total resolution tardiness, while look-ahead reaches 4,818.88.
Overall throughput favors look-ahead greedy, but a business-facing assessment should rank urgent-ticket protection first. Greedy is slightly best on the narrowest P1 first-response metric, look-ahead greedy is best on throughput, and the OR scheduler is the strongest optimization-based compromise against greedy: it schedules 10 more tickets and lowers total first-response tardiness from 186,591.88 to 131,681.71 minutes. In an overloaded system, that prioritization is the core operational decision, not a secondary preference. For the mathematical rationale behind that choice, see Priority Weights and Objective.
The current benchmark uses 5-minute decision slots. This is the practical
compromise between fidelity and realism: 15-minute slots were too coarse for
the ticket arrivals and effort distribution, while 1-minute slots are more
detailed than most real support dispatch loops need. For the slotting logic
behind release_ts, duration_slots, and the remaining-day horizon, see
Time Structure.
The repo separates synthetic demand generation, fixed agent supply, replay policies, and evaluation outputs so the methods stay easy to compare.
data/tickets.csv: synthetic demand-side arrivals with SLA deadlinesdata/agents.csv: fixed hand-authored agent rosterscripts/generate_ticket_assignment_data.py: deterministic ticket generatorsrc/greedy_baseline.py: online myopic dispatchersrc/lookahead_greedy.py: same-day earliest-fit reservation heuristicsrc/or_scheduler.py: rolling current-slot OR scheduler built on CP-SATsrc/preprocessing.pyandsrc/evaluation.py: shared slotting, feasibility, schedule writing, and metrics aggregation utilities
If you want the dataset design details behind the benchmark, start with Ticket Demand Dataset and Agent Supply Dataset.
greedy_baseline: only considers tickets that are already open and agents that are idle now. It assigns tickets in deterministic urgency order and does not reserve future capacity.lookahead_greedy: keeps the same urgency ordering, but can reserve the earliest feasible future slot later on the same business day.or_scheduler: re-solves a current-slot CP-SAT model every 5 minutes, considers only tickets already open at that solve, and commits only starts in the current slot before re-optimizing later.
This repo targets Python 3.10+.
The setup commands below are shown for PowerShell on Windows.
git clone https://github.com/feboe/sla-aware-ticket-assignment.git
cd sla-aware-ticket-assignment
python -m venv .venv
.\.venv\Scripts\python -m pip install --upgrade pip
.\.venv\Scripts\python -m pip install -r requirements.txt
.\.venv\Scripts\python -m unittest discover -s tests -p "test_*.py"If you want to reproduce the published benchmark outputs from a fresh clone, run the data generator once and then execute the three benchmark scripts below.
.\.venv\Scripts\python scripts\generate_ticket_assignment_data.pyThis writes data/tickets.csv.
.\.venv\Scripts\python -m unittest discover -s tests -p "test_*.py".\.venv\Scripts\python scripts\run_greedy_baseline.pyThis writes:
results/greedy_baseline_schedule.csvresults/greedy_baseline_metrics.json
.\.venv\Scripts\python scripts\run_lookahead_greedy.pyThis writes:
results/lookahead_greedy_schedule.csvresults/lookahead_greedy_metrics.json
.\.venv\Scripts\python scripts\run_or_scheduler.pyThis writes:
results/or_scheduler_schedule.csvresults/or_scheduler_metrics.json
The OR scheduler uses a rolling current-slot objective with weighted first-response tardiness, weighted resolution tardiness, and a small direct backlog term. It is not a full future-slot planner.
All three runners write:
- one schedule CSV in
results/ - one metrics JSON in
results/
The final replay schedule uses the shared schema from src/evaluation.py and
persists only:
scheduledbacklog_end
All three metrics JSON files share the same core structure:
- top-level replay summary fields such as
replay_business_days,slot_minutes,horizon_start_ts,horizon_end_ts,total_tickets,scheduled_tickets, andtickets_in_backlog - nested
scheduledandbacklogsections with the same priority breakdown - per-agent
agent_utilizationplusoverall_agent_utilization
The OR scheduler adds solver-specific diagnostics:
solve_call_countavg_solve_time_secsolver_status_counts
- The OR scheduler is a rolling current-slot model, not a full-day planner. It decides what can start now and re-optimizes every 5 minutes.
- Future ticket arrivals are unknown inside any single solve, so the model makes decisions with current information only.
- Deferred work is penalized with a remaining-day horizon proxy, and end-of-run backlog metrics are measured at the final replay horizon end.
required_skill_tagsand agentskill_tagsare documented in the datasets but are still metadata in the current version, not hard matching constraints.- The benchmark is synthetic and portfolio-oriented by design: it aims to be explainable and reproducible rather than to capture every edge case of a real support organization.