[DO NOT MERGE] GuideLLM Refactor Draft #365

sjmonson · 2025-09-23T18:36:07Z

Branch with all refactor changes merged in for testing

… refactor branch Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

…ng config.py to settings.py due to later config additions and potential conflicts in naming Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

…view Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

… for plural Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

…p to avoid conflicts Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

Signed-off-by: jaredoconnell <joconnel@redhat.com>

## Summary  TODO --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)

## Summary This PR ports the new functionality from `benchmark run` to `benchmark from-file`, and does so in a way that reuses as much code as practical to have one source of truth. ## Details  - Fixes from-file by making it to use the new output format. - Moves code related to the new output formats to separate functions that are called from both benchmark entrypoints. - Moves additional chunks of code out of the large benchmark run entrypoint function for modularity. ## Test Plan Run a benchmark with an output of json or yaml, and use `from-file` to re-import it and export it. You can select any output type supported by `benchmark run`. `guidellm benchmark from-file ./result.json --output-formats console` `guidellm benchmark from-file ./result.yaml --output-formats yaml` ## Related Issues --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [x] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) --------- Signed-off-by: Jared O'Connell <joconnel@redhat.com>

## Summary Reintroduces a few changes from main --------- Signed-off-by: Samuel Monson <smonson@redhat.com>

Replace scenario entrypoint with a decorator Forward-port get_default and from_file to Scenario Apply scenario args as an update to kwargs Readd scenario support to CLI Signed-off-by: Samuel Monson <smonson@redhat.com>

Signed-off-by: Samuel Monson <smonson@redhat.com>

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

…352) ## **Summary** Renames `config.py` to `settings.py` for better semantic clarity particularly with later config pathways and updates project dependencies. ## **Details** - **Module Refactor**: Renamed `src/guidellm/config.py` to `settings.;py`and updated all import references across the codebase - **Dependency Updates**: - Added new dependencies: `culsans`, `eval_type_backport`, `faker`, `msgpack`, `pyhumps`, `sanic`, `uvloop` - Updated `pytest-asyncio` from `~=0.23.8` to `~=1.1.0` - Removed `recommended` optional dependencies section - Removed `[dependency-groups]` and `[tool.pdm]` sections - **Configuration Improvements**: - Reformatted license specification to use `{text = "Apache-2.0"}` format - Added `target-version = "py39"` to Ruff configuration - Improved code formatting and comment alignment in pyproject.toml - Enhanced Ruff ignore rules with better documentation - Fixed trailing comma in pytest markers - **Import Organization**: Updated import order in `__init__.py` to import logger before settings for better dependency flow ## **Test Plan** - Automated tests are passing ## **Related Issues** N/A --- - [x] "I certify that all code in this PR is my own, except as noted below." ## **Use of AI** - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)

…nsion needed for the other, refactored packages (#353) ## **Summary** Introduces a comprehensive utilities infrastructure to support distributed processing, inter-process communication, and statistical analysis for the GuideLLM framework. The changes include new modules for encoding/serialization, messaging systems, statistical computations, and various utility mixins while removing deprecated functionality and improving code organization. ## **Details** - **Added messaging infrastructure** (`messaging.py`): Inter-process communication abstractions supporting queue-based, pipe-based, and manager-based messaging with configurable encoding and serialization - **Added encoding utilities** (`encoding.py`): High-performance message encoding/decoding with Pydantic model support, configurable serialization strategies (dict/sequence), and binary encoding (msgpack/msgspec) - **Added statistical analysis** (`statistics.py`): Comprehensive statistical computation tools including distribution summaries, percentiles, running statistics, and specialized request timing analysis - **Added registry system** (`registry.py`): Dynamic object registration and discovery with auto-discovery capabilities for extensible plugin architectures - **Added Pydantic utilities** (`pydantic_utils.py`): Polymorphic model serialization, registry integration, and standardized base model classes - **Added console utilities** (`console.py`): Rich console integration with status tracking, colored output, and progress indicators - **Added synchronization utilities** (`synchronous.py`): Async-compatible wrappers for threading/multiprocessing synchronization primitives - **Added singleton patterns** (`singleton.py`): Thread-safe and basic singleton implementations for resource management - **Added utility functions** (`functions.py`): Safe arithmetic operations, timestamp formatting, and defensive programming utilities - **Added mixin classes** (`mixins.py`): Reusable mixins for metadata extraction and object introspection - **Added auto-importer** (`auto_importer.py`): Automatic module importing for dynamic class discovery - **Enhanced text utilities**: Added `format_value_display()` function for consistent metric formatting and improved documentation - **Removed deprecated code**: Deleted `dict.py` module with `recursive_key_update()` and `camelize_str()` function from `text.py` - **Updated imports**: Comprehensive reorganization of `__init__.py` exports to reflect new utilities structure ## **Test Plan** - Full unit tests added and passing ## **Related Issues** This refactor supports the broader scheduler infrastructure improvements and distributed processing capabilities. --- - [x] "I certify that all code in this PR is my own, except as noted below." ## **Use of AI** - [x] Includes AI-assisted code completion - [x] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)

…pansion (#354) ## **Summary** Introduces a comprehensive constraints system and enhanced timing control for the scheduler refactor. The implementation moves from hardcoded execution limits to a flexible, composable constraint system that enables sophisticated benchmark stopping criteria. Additionally, request timing calculations are moved from precalculated to per-request basis, enabling dynamic rate adjustments and better distributed coordination. ## **Details** - **Added constraints system** (`constraints.py`): Implements Protocol-based constraint architecture with support for request limits, duration limits, error thresholds, and sliding window error rates - `MaxNumberConstraint`: Limits execution based on request count - `MaxDurationConstraint`: Limits execution based on time duration - `MaxErrorsConstraint`: Limits execution based on absolute error count - `MaxErrorRateConstraint`: Limits execution based on sliding window error rate - `MaxGlobalErrorRateConstraint`: Limits execution based on global error rate - `ConstraintsInitializerFactory`: Registry system for constraint creation and serialization - **Refactored core objects** (`objects.py`): Replaced `result.py` and expanded capabilities - Made scheduler package fully generic, decoupling from backend-specific types - Added `BackendInterface` protocol for type-safe backend integration - Enhanced `ScheduledRequestInfo` with comprehensive timing and status tracking - Added `SchedulerState` for distributed state coordination - Introduced `SchedulerUpdateAction` for constraint-based control signals - **Enhanced scheduling strategies** (`strategy.py`): Introduced request timing abstractions - Added `ScheduledRequestTimings` base class for timing implementations - `LastCompletionRequestTimings`: For synchronous and concurrent strategies - `NoDelayRequestTimings`: For maximum throughput strategies - `ConstantRateRequestTimings`: For fixed-rate scheduling - `PoissonRateRequestTimings`: For stochastic request patterns - Strategies now create per-worker timing instances instead of precalculated schedules - **Added environment abstractions** (`environment.py`): Coordination layer for distributed execution - `Environment` protocol for distributed synchronization - `NonDistributedEnvironment` implementation for single-node execution - **Worker process management** (`worker.py`, `worker_group.py`): Distributed request processing infrastructure - Individual worker process management with lifecycle coordination - Multi-process orchestration with state synchronization - Constraint evaluation and graceful shutdown coordination ## **Test Plan** - Full unit tests and some integration tests added and passing ## **Related Issues** - Part of scheduler refactor initiative to support distributed benchmarking --- - [x] "I certify that all code in this PR is my own, except as noted below." ## **Use of AI** - [x] Includes AI-assisted code completion - [x] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)

…nsion (#355) ## **Summary** Refactors the backend package to introduce a new architecture that supports the scheduler refactor. The changes include new object models for requests/responses, adjustments to the registry-based backend system, and integration with the scheduler interface. The refactor replaces the previous response/streaming model with a more flexible request-response pattern that supports timing measurements and distributed execution. ## **Details** - **New object models**: Introduced `GenerationRequest`, `GenerationResponse`, and `GenerationRequestTimings` in `objects.py` to standardize data flow between scheduler and backends - **Backend architecture refactor**: Redesigned `Backend` base class to extend `RegistryMixin` and `BackendInterface`, enabling automatic registration and scheduler integration - **Process lifecycle management**: Added `process_startup()` and `process_shutdown()` methods to support distributed worker processes with proper resource management - **OpenAI backend modernization**: Rewrote `OpenAIHTTPBackend` with improved error handling, streaming support, beginning support for multimodal content processing, and configuration management - **Registry system**: Implemented automatic backend registration using decorators, replacing manual registry management - **Scheduler integration**: Backends now implement `BackendInterface` with `resolve()` method for processing generation requests with timing metadata - **Removed deprecated modules**: Eliminated `response.py` with its streaming response types in favor of the new object model - **Enhanced test coverage**: Added comprehensive unit tests for all new components with smoke, sanity, and regression test categories ## **Test Plan** - Execute new backend module test suites covering smoke, sanity, and regression scenarios - Verify OpenAI backend functionality with mocked HTTP responses for both streaming and non-streaming modes - Test backend registry and factory pattern functionality - Validate integration with scheduler interfaces and timing measurement - Ensure proper resource management during process startup/shutdown cycles ## **Related Issues** - Part of the larger scheduler refactor initiative --- - [x] "I certify that all code in this PR is my own, except as noted below." ## **Use of AI** - [x] Includes AI-assisted code completion - [x] Includes code generated by an AI application - [] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)

## **Summary** Introduces a comprehensive refactor of the benchmarking system, replacing the previous architecture with a more flexible and extensible design. The changes include new aggregation protocols, enhanced benchmark objects with comprehensive metrics, and improved progress tracking capabilities. This refactor enables better separation of concerns, more granular metric collection, and improved real-time monitoring of benchmark execution. ## **Details** - **New Aggregation System**: Replaced `BenchmarkAggregator` with protocol-based `Aggregator` and `CompilableAggregator` interfaces, enabling composable metric collection and compilation - **Enhanced Benchmark Objects**: Refactored benchmark data models in `objects.py` with comprehensive metrics including timing distributions, token statistics, and performance measurements - **Improved Benchmarker**: Redesigned `Benchmarker` class to coordinate request scheduling, data aggregation, and result compilation with thread-safe singleton pattern - **Flexible Output System**: Added pluggable output formatters supporting console, CSV, HTML, and JSON formats with configurable file paths - **Advanced Progress Tracking**: Implemented composite progress handlers with real-time console display showing detailed metrics, timing information, and progress bars - **Profile System Enhancements**: Enhanced profile configurations with better strategy generation, constraint management, and completion tracking - **Comprehensive Entrypoints**: Redesigned `benchmark_generative_text` function with improved configuration options, validation, and error handling ### Key Components Added: - `SchedulerStatsAggregator`: Collects scheduler timing and performance metrics - `GenerativeRequestsAggregator`: Compiles complete generative benchmark results with warmup/cooldown filtering - `GenerativeStatsProgressAggregator`: Tracks real-time generation metrics during execution - `BenchmarkerProgressGroup`: Composite progress handler for multiple tracking instances - `GenerativeBenchmarkerOutput`: Pluggable output system with multiple format support ### Breaking Changes: - Removed `BenchmarkAggregator` and `GenerativeBenchmarkAggregator` classes - Restructured benchmark object hierarchy and field names - Modified `Benchmarker.run()` method signature and return type - Updated progress tracking interfaces and event handling ## **Test Plan** - Tests to be added in a subsequent PR ## **Related Issues** - Part of the larger scheduler refactor initiative --- - [x] "I certify that all code in this PR is my own, except as noted below." ## **Use of AI** - [x] Includes AI-assisted code completion - [x] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)

## **Summary** Introduces a comprehensive mock server implementation that simulates OpenAI and vLLM APIs with configurable timing characteristics and response patterns. The mock server enables realistic performance testing and validation of GuideLLM benchmarking workflows without requiring actual model deployments, supporting both streaming and non-streaming endpoints with proper token counting, latency simulation (TTFT/ITL), and error handling. ## **Details** - Added `mock_server` package with modular architecture including configuration, handlers, models, server, and utilities - Implemented `MockServerConfig` with Pydantic settings for centralized configuration management supporting environment variables - Created HTTP request handlers for OpenAI-compatible endpoints: - `ChatCompletionsHandler` for `/v1/chat/completions` with streaming support - `CompletionsHandler` for `/v1/completions` legacy endpoint - `TokenizerHandler` for vLLM-compatible `/tokenize` and `/detokenize` endpoints - Added comprehensive Pydantic models for request/response validation compatible with both OpenAI and vLLM API specifications - Implemented high-performance Sanic-based server with CORS support, middleware, and proper error handling - Created mock tokenizer and text generation utilities with deterministic token generation for reproducible testing - Added timing generators for realistic latency simulation including TTFT (Time To First Token) and ITL (Inter-Token Latency) - Included comprehensive test suite with integration tests using real HTTP server instances ## **Test Plan** - Unit/integration style tests added to automation ## **Related Issues** - Part of the larger scheduler refactor initiative --- - [x] "I certify that all code in this PR is my own, except as noted below." ## **Use of AI** - [x] Includes AI-assisted code completion - [x] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) --------- Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

## Summary This PR handles errors that occur when there are no successful requests. There will obviously still be an error, but it will be one that the user can get useful information from, rather than one that is the inner workings breaking. ## Details - Adds default value for an inner data type to allow it to work in this edge case. - Adds an error check that creates a runtime error with an explanation for the failure. The error message can be changed if you would like the wording changed. - Fixes a type literal mismatch. ## Test Plan - Run GuideLLM against a mock server in a way that results in all requests failing. Like setting the max token value way too small. --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) --------- Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

… off of til merged into refactor base) (#358) ## **Summary** Refactor of the GuideLLM command-line interface, streamlining the benchmark command structure while adding new mock server functionality and performance optimization features and adding in any missing fixes in other PRs to stabilize the refactor to a working state. ## **Details** - **CLI Interface Overhaul**: - Removed legacy `-scenario` option in favor of direct parameter specification - Reorganized CLI options with clear grouping (Backend, Data, Output, Aggregators, Constraints) - Added parameter aliases for backward compatibility (e.g., `-rate-type` → `-profile`) - Simplified option defaults by removing scenario-based defaults - Added comprehensive docstrings and help text for all commands and options - **New Mock Server Command**: - Added guidellm mock-server command with full OpenAI/vLLM API compatibility - Configurable latency characteristics (request latency, TTFT, ITL, output tokens) - Support for both streaming and non-streaming endpoints - Comprehensive server configuration options (host, port, workers, model name) - **Performance Optimization Features**: - Added new `perf` optional dependency group with `orjson`, `msgpack`, `msgspec`, uvloop - Integrated uvloop for enhanced async performance when available - Optimized event loop policy selection based on availability - **Internal Architecture Improvements**: - Updated import paths (guidellm.backend → guidellm.backends, guidellm.scheduler.strategy → guidellm.scheduler) - Replaced scenario-based benchmarking with direct benchmark_generative_text function calls - Enhanced error handling and parameter validation - Simplified logging format for better readability - **Enhanced Output and Configuration**: - Added support for multiple output formats with `-output-formats` option - Improved output path handling for files vs directories - Added new constraint options (`-max-errors`, `-max-error-rate`, `-max-global-error-rate`) - Enhanced warmup/cooldown specification with flexible numeric/percentage options - **Code Quality Improvements**: - Comprehensive type annotations throughout the codebase - Detailed docstrings following Google/NumPy style conventions - Consistent parameter naming and organization - Removed deprecated version option from main CLI group ## **Test Plan** - Tests for entrypoints to be added later ## **Related Issues** - Part of the larger scheduler refactor initiative --- - [x] "I certify that all code in this PR is my own, except as noted below." ## **Use of AI** - [x] Includes AI-assisted code completion - [x] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

features/refactor/scenarios

features/refactor-fix-html

features/refactor/fix-csv

markurtz and others added 23 commits September 19, 2025 03:22

Base version update to what is pushed as latest to enable PR for base…

a2d19cd

… refactor branch Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

Base version update to what is pushed as latest to enable PR for base…

cd5a92d

… refactor branch Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

core changes for refactor including pyproject.toml updates and renami…

8d6e19a

…ng config.py to settings.py due to later config additions and potential conflicts in naming Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

remove improper readdition of pyhumps

669848d

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

refactors for the utility modules

6b6ed98

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

Remove old pydantic file that is now replaced

d15cf17

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

fixes from copilot review

5b83c2d

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

add refactored scheduler package and tests

c84299b

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

Standardize on plural for modules/packages and update from copilot re…

a7ae737

…view Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

backend refactor implementations

02554b0

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

fixes from copilot review and standardize backend package to backends…

a88605e

… for plural Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

remove renaming changes from benchmark package til after that PR is u…

452eb65

…p to avoid conflicts Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

Add in benchmark package refactor

7829fb8

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

fixes and rebase

4834767

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

fixes from copilot review

61736f5

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

Mock server implementation for guidellm

a28bbe3

fixes from copilot review

bb98193

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

Any missing changes / working state for refactor

a9a082a

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

add in the perf extras

6d0d4c2

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>

Complete CSV output

bfc8e50

Signed-off-by: jaredoconnell <joconnel@redhat.com>

[GuideLLM Refactor] Entrypoint: Reintroduce changes from main (#363)

78615f7

## Summary Reintroduces a few changes from main --------- Signed-off-by: Samuel Monson <smonson@redhat.com>

sjmonson force-pushed the features/refactor/base-draft branch from 4549a21 to 4c4ea5d Compare September 25, 2025 16:15

sjmonson and others added 6 commits September 25, 2025 17:20

Update GenerativeTextScenario to match current def

3ac1537

Replace scenario entrypoint with a decorator Forward-port get_default and from_file to Scenario Apply scenario args as an update to kwargs Readd scenario support to CLI Signed-off-by: Samuel Monson <smonson@redhat.com>

Add workaround for pydantic/pydantic#9541

c47a1f6

Signed-off-by: Samuel Monson <smonson@redhat.com>

Rename rate_type -> profile in builtin scenarios

965aca2

Signed-off-by: Samuel Monson <smonson@redhat.com>

Always parse rate as list[float]

d9a4df2

Signed-off-by: Samuel Monson <smonson@redhat.com>

Fix bug where empty constraints in sweep caused error

03f9085

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Update HTML processing references for latest data output

9c401da

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

jaredoconnell and others added 18 commits September 26, 2025 11:38

Fix injection of data into the HTML output

54556ae

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Merge branch 'features/refactor/base' into features/refactor/working

447101b

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Merge branch 'main' into features/refactor/base

6d31244

Share type alises between entrypoints and scenario

2c6637d

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Pluralize type.py > types.py

2c70edd

Convert new types to TypeAliasTypes

4bff34a

Address Copilot review

3057229

[GuideLLM Refactor] Scenarios reenablement #362

30ab756

features/refactor/scenarios

[GuideLLM Refactor] Features/refactor fix html #377

eeebb6b

features/refactor-fix-html

[GuideLLM Refactor] Complete CSV output #378

aa81de8

features/refactor/fix-csv

sjmonson force-pushed the features/refactor/base-draft branch from 4c4ea5d to aa81de8 Compare September 30, 2025 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DO NOT MERGE] GuideLLM Refactor Draft #365

[DO NOT MERGE] GuideLLM Refactor Draft #365

Uh oh!

sjmonson commented Sep 23, 2025

Uh oh!

Uh oh!

[DO NOT MERGE] GuideLLM Refactor Draft #365

Are you sure you want to change the base?

[DO NOT MERGE] GuideLLM Refactor Draft #365

Uh oh!

Conversation

sjmonson commented Sep 23, 2025

Uh oh!

Uh oh!