Skip to content

Conversation

tukwila
Copy link
Contributor

@tukwila tukwila commented Sep 30, 2025

Summary

#341

Details

  • [ ]

Test Plan

start sanic server with or without parameters:

  1. python tests/unit/sanic_server.py
image
  1. python tests/unit/sanic_server.py --host=0.0.0.0 --port=8000 --workers=4 --debug
image
  1. execute guidellm benchmark test
(myenv) guidellm % guidellm benchmark \
 --target "http://localhost:8000/" \
--model "mock-qwen-2.5" \
 --rate-type "synchronous" \
 --processor "${local_path}/Qwen2.5-1.5B-Instruct" \
 --data "prompt_tokens=512,output_tokens=256, samples=10"
Creating backend...
Backend openai_http connected to http://localhost:8000/ for model mock-qwen-2.5.
Creating request loader...
Created loader with 10 unique requests from prompt_tokens=512,output_tokens=256,
samples=10.


╭─ Benchmarks ─────────────────────────────────────────────────────────────────╮
│ [0… syn… (c… Req:    0.0 req/s,    0.76s Lat,     0.0 Conc,      10 Comp,  … │
│              Tok:    9.6 gen/s,   28.9 tot/s,   8.5ms TTFT,    2.8ms ITL,  … │
╰──────────────────────────────────────────────────────────────────────────────╯
Generating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ (1/1) [ 0:15:02 < 0:00:00 ]


Benchmarks Metadata:
    Run id:01c96eea-9e56-467b-8e39-35c67087f6ea
    Duration:903.3 seconds
    Profile:type=synchronous, strategies=['synchronous']
    Args:max_number=10, max_duration=None, warmup_number=None,
    warmup_duration=None, cooldown_number=None, cooldown_duration=None
    Worker:type_='generative_requests_worker' backend_type='openai_http'
    backend_target='http://localhost:8000' backend_model='mock-qwen-2.5'
    backend_info={'max_output_tokens': 16384, 'timeout': 300, 'http2': True,
    'follow_redirects': True, 'headers': {}, 'text_completions_path':
    '/v1/completions', 'chat_completions_path': '/v1/chat/completions'}
    Request Loader:type_='generative_request_loader'
    data='prompt_tokens=512,output_tokens=256, samples=10' data_args=None
    processor='${local_path}/Qwen2.5-1.5B-Instruct'
    processor_args=None
    Extras:None


Benchmarks Info:
================================================================================
======================================================================
Metadata                                    |||| Requests Made  ||| Prompt
Tok/Req  ||| Output Tok/Req  ||| Prompt Tok Total  ||| Output Tok Total  ||
  Benchmark| Start Time| End Time| Duration (s)|  Comp|  Inc|  Err|   Comp|
Inc|  Err|   Comp|  Inc|  Err|   Comp|   Inc|   Err|   Comp|   Inc|   Err
-----------|-----------|---------|-------------|------|-----|-----|-------|-----
|-----|-------|-----|-----|-------|------|------|-------|------|------
synchronous|   04:37:43| 04:42:09|        265.8|    10|    0|    0|  512.0|
0.0|  0.0|  256.0|  0.0|  0.0|   5120|     0|     0|   2560|     0|     0
================================================================================
======================================================================


Benchmarks Stats:
================================================================================
===============================================================
Metadata   | Request Stats         || Out Tok/sec| Tot Tok/sec| Req Latency
(sec)  ||| TTFT (ms)       ||| ITL (ms)       ||| TPOT (ms)      ||
  Benchmark| Per Second| Concurrency|        mean|        mean|  mean|  median|
p99| mean| median|  p99| mean| median| p99| mean| median| p99
-----------|-----------|------------|------------|------------|------|--------|-
-----|-----|-------|-----|-----|-------|----|-----|-------|----
synchronous|       0.04|        0.03|         9.6|        28.9|  0.76|    0.76|
0.88|  8.5|    4.7| 41.5|  2.8|    2.8| 3.3|  2.8|    2.8| 3.3
================================================================================
===============================================================

Saving benchmarks report...
Benchmarks report saved to ${local_path}/guidellm/benchmarks.json

Benchmarking complete.

Related Issues


  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Signed-off-by: guangli.bao <guangli.bao@daocloud.io>
@markurtz
Copy link
Collaborator

markurtz commented Oct 1, 2025

Thanks for the contribution @tukwila! There's a very large refactor ongoing currently that introduces some of this. Could you take a look at adapting this PR on top of the refactor branch and fixing anything that's missing there? #351

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants