Skip to content

Conversation

sjmonson
Copy link
Collaborator

Branch with all refactor changes merged in for testing

markurtz and others added 23 commits September 19, 2025 03:22
… refactor branch

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
… refactor branch

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
…ng config.py to settings.py due to later config additions and potential conflicts in naming

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
…view

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
… for plural

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
…p to avoid conflicts

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: jaredoconnell <joconnel@redhat.com>
## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->

TODO

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
## Summary

This PR ports the new functionality from `benchmark run` to `benchmark
from-file`, and does so in a way that reuses as much code as practical
to have one source of truth.

## Details

<!--
Provide a detailed list of all changes introduced in this pull request.
-->
- Fixes from-file by making it to use the new output format.
- Moves code related to the new output formats to separate functions
that are called from both benchmark entrypoints.
- Moves additional chunks of code out of the large benchmark run
entrypoint function for modularity.

## Test Plan

Run a benchmark with an output of json or yaml, and use `from-file` to
re-import it and export it. You can select any output type supported by
`benchmark run`.

`guidellm benchmark from-file ./result.json --output-formats console`
`guidellm benchmark from-file ./result.yaml --output-formats yaml`

## Related Issues

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [x] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

---------

Signed-off-by: Jared O'Connell <joconnel@redhat.com>
## Summary

Reintroduces a few changes from main

---------

Signed-off-by: Samuel Monson <smonson@redhat.com>
@sjmonson sjmonson force-pushed the features/refactor/base-draft branch from 4549a21 to 4c4ea5d Compare September 25, 2025 16:15
sjmonson and others added 6 commits September 25, 2025 17:20
Replace scenario entrypoint with a decorator

Forward-port get_default and from_file to Scenario

Apply scenario args as an update to kwargs

Readd scenario support to CLI

Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
jaredoconnell and others added 18 commits September 26, 2025 11:38
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
…352)

## **Summary**

Renames `config.py` to `settings.py` for better semantic clarity
particularly with later config pathways and updates project
dependencies.

## **Details**

- **Module Refactor**:
Renamed `src/guidellm/config.py` to `settings.;py`and updated all import
references across the codebase
- **Dependency Updates**:
- Added new
dependencies: `culsans`, `eval_type_backport`, `faker`, `msgpack`, `pyhumps`, `sanic`, `uvloop`
    - Updated `pytest-asyncio` from `~=0.23.8` to `~=1.1.0`
    - Removed `recommended` optional dependencies section
    - Removed `[dependency-groups]` and `[tool.pdm]` sections
- **Configuration Improvements**:
- Reformatted license specification to use `{text =
"Apache-2.0"}` format
    - Added `target-version = "py39"` to Ruff configuration
    - Improved code formatting and comment alignment in pyproject.toml
    - Enhanced Ruff ignore rules with better documentation
    - Fixed trailing comma in pytest markers
- **Import Organization**: Updated import order in `__init__.py` to
import logger before settings for better dependency flow

## **Test Plan**

- Automated tests are passing

## **Related Issues**

N/A

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## **Use of AI**

- [ ]  Includes AI-assisted code completion
- [ ]  Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
…nsion needed for the other, refactored packages (#353)

## **Summary**

Introduces a comprehensive utilities infrastructure to support
distributed processing, inter-process communication, and statistical
analysis for the GuideLLM framework. The changes include new modules for
encoding/serialization, messaging systems, statistical computations, and
various utility mixins while removing deprecated functionality and
improving code organization.

## **Details**

- **Added messaging infrastructure** (`messaging.py`): Inter-process
communication abstractions supporting queue-based, pipe-based, and
manager-based messaging with configurable encoding and serialization
- **Added encoding utilities** (`encoding.py`): High-performance message
encoding/decoding with Pydantic model support, configurable
serialization strategies (dict/sequence), and binary encoding
(msgpack/msgspec)
- **Added statistical analysis** (`statistics.py`): Comprehensive
statistical computation tools including distribution summaries,
percentiles, running statistics, and specialized request timing analysis
- **Added registry system** (`registry.py`): Dynamic object registration
and discovery with auto-discovery capabilities for extensible plugin
architectures
- **Added Pydantic utilities** (`pydantic_utils.py`): Polymorphic model
serialization, registry integration, and standardized base model classes
- **Added console utilities** (`console.py`): Rich console integration
with status tracking, colored output, and progress indicators
- **Added synchronization utilities** (`synchronous.py`):
Async-compatible wrappers for threading/multiprocessing synchronization
primitives
- **Added singleton patterns** (`singleton.py`): Thread-safe and basic
singleton implementations for resource management
- **Added utility functions** (`functions.py`): Safe arithmetic
operations, timestamp formatting, and defensive programming utilities
- **Added mixin classes** (`mixins.py`): Reusable mixins for metadata
extraction and object introspection
- **Added auto-importer** (`auto_importer.py`): Automatic module
importing for dynamic class discovery
- **Enhanced text utilities**: Added `format_value_display()` function
for consistent metric formatting and improved documentation
- **Removed deprecated code**: Deleted `dict.py` module
with `recursive_key_update()` and `camelize_str()` function
from `text.py`
- **Updated imports**: Comprehensive reorganization
of `__init__.py` exports to reflect new utilities structure

## **Test Plan**

- Full unit tests added and passing

## **Related Issues**

This refactor supports the broader scheduler infrastructure improvements
and distributed processing capabilities.

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## **Use of AI**

- [x]  Includes AI-assisted code completion
- [x]  Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
…pansion (#354)

## **Summary**

Introduces a comprehensive constraints system and enhanced timing
control for the scheduler refactor. The implementation moves from
hardcoded execution limits to a flexible, composable constraint system
that enables sophisticated benchmark stopping criteria. Additionally,
request timing calculations are moved from precalculated to per-request
basis, enabling dynamic rate adjustments and better distributed
coordination.

## **Details**

- **Added constraints system** (`constraints.py`): Implements
Protocol-based constraint architecture with support for request limits,
duration limits, error thresholds, and sliding window error rates
    - `MaxNumberConstraint`: Limits execution based on request count
    - `MaxDurationConstraint`: Limits execution based on time duration
- `MaxErrorsConstraint`: Limits execution based on absolute error count
- `MaxErrorRateConstraint`: Limits execution based on sliding window
error rate
- `MaxGlobalErrorRateConstraint`: Limits execution based on global error
rate
- `ConstraintsInitializerFactory`: Registry system for constraint
creation and serialization
- **Refactored core objects** (`objects.py`): Replaced `result.py` and
expanded capabilities
- Made scheduler package fully generic, decoupling from backend-specific
types
- Added `BackendInterface` protocol for type-safe backend integration
- Enhanced `ScheduledRequestInfo` with comprehensive timing and status
tracking
    - Added `SchedulerState` for distributed state coordination
- Introduced `SchedulerUpdateAction` for constraint-based control
signals
- **Enhanced scheduling strategies** (`strategy.py`): Introduced request
timing abstractions
- Added `ScheduledRequestTimings` base class for timing implementations
- `LastCompletionRequestTimings`: For synchronous and concurrent
strategies
    - `NoDelayRequestTimings`: For maximum throughput strategies
    - `ConstantRateRequestTimings`: For fixed-rate scheduling
    - `PoissonRateRequestTimings`: For stochastic request patterns
- Strategies now create per-worker timing instances instead of
precalculated schedules
- **Added environment abstractions** (`environment.py`): Coordination
layer for distributed execution
    - `Environment` protocol for distributed synchronization
- `NonDistributedEnvironment` implementation for single-node execution
- **Worker process management** (`worker.py`, `worker_group.py`):
Distributed request processing infrastructure
    - Individual worker process management with lifecycle coordination
    - Multi-process orchestration with state synchronization
    - Constraint evaluation and graceful shutdown coordination

## **Test Plan**

- Full unit tests and some integration tests added and passing

## **Related Issues**

- Part of scheduler refactor initiative to support distributed
benchmarking

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## **Use of AI**

- [x]  Includes AI-assisted code completion
- [x]  Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
…nsion (#355)

## **Summary**

Refactors the backend package to introduce a new architecture that
supports the scheduler refactor. The changes include new object models
for requests/responses, adjustments to the registry-based backend
system, and integration with the scheduler interface. The refactor
replaces the previous response/streaming model with a more flexible
request-response pattern that supports timing measurements and
distributed execution.

## **Details**

- **New object models**:
Introduced `GenerationRequest`, `GenerationResponse`,
and `GenerationRequestTimings` in `objects.py` to standardize data flow
between scheduler and backends
- **Backend architecture refactor**: Redesigned `Backend` base class to
extend `RegistryMixin` and `BackendInterface`, enabling automatic
registration and scheduler integration
- **Process lifecycle management**:
Added `process_startup()` and `process_shutdown()` methods to support
distributed worker processes with proper resource management
- **OpenAI backend modernization**: Rewrote `OpenAIHTTPBackend` with
improved error handling, streaming support, beginning support for
multimodal content processing, and configuration management
- **Registry system**: Implemented automatic backend registration using
decorators, replacing manual registry management
- **Scheduler integration**: Backends now
implement `BackendInterface` with `resolve()` method for processing
generation requests with timing metadata
- **Removed deprecated modules**: Eliminated `response.py` with its
streaming response types in favor of the new object model
- **Enhanced test coverage**: Added comprehensive unit tests for all new
components with smoke, sanity, and regression test categories

## **Test Plan**

- Execute new backend module test suites covering smoke, sanity, and
regression scenarios
- Verify OpenAI backend functionality with mocked HTTP responses for
both streaming and non-streaming modes
- Test backend registry and factory pattern functionality
- Validate integration with scheduler interfaces and timing measurement
- Ensure proper resource management during process startup/shutdown
cycles

## **Related Issues**

- Part of the larger scheduler refactor initiative

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## **Use of AI**

- [x]  Includes AI-assisted code completion
- [x]  Includes code generated by an AI application
- [] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
## **Summary**

Introduces a comprehensive refactor of the benchmarking system,
replacing the previous architecture with a more flexible and extensible
design. The changes include new aggregation protocols, enhanced
benchmark objects with comprehensive metrics, and improved progress
tracking capabilities. This refactor enables better separation of
concerns, more granular metric collection, and improved real-time
monitoring of benchmark execution.

## **Details**

- **New Aggregation System**: Replaced `BenchmarkAggregator` with
protocol-based `Aggregator` and `CompilableAggregator` interfaces,
enabling composable metric collection and compilation
- **Enhanced Benchmark Objects**: Refactored benchmark data models
in `objects.py` with comprehensive metrics including timing
distributions, token statistics, and performance measurements
- **Improved Benchmarker**: Redesigned `Benchmarker` class to coordinate
request scheduling, data aggregation, and result compilation with
thread-safe singleton pattern
- **Flexible Output System**: Added pluggable output formatters
supporting console, CSV, HTML, and JSON formats with configurable file
paths
- **Advanced Progress Tracking**: Implemented composite progress
handlers with real-time console display showing detailed metrics, timing
information, and progress bars
- **Profile System Enhancements**: Enhanced profile configurations with
better strategy generation, constraint management, and completion
tracking
- **Comprehensive Entrypoints**:
Redesigned `benchmark_generative_text` function with improved
configuration options, validation, and error handling

### Key Components Added:

- `SchedulerStatsAggregator`: Collects scheduler timing and performance
metrics
- `GenerativeRequestsAggregator`: Compiles complete generative benchmark
results with warmup/cooldown filtering
- `GenerativeStatsProgressAggregator`: Tracks real-time generation
metrics during execution
- `BenchmarkerProgressGroup`: Composite progress handler for multiple
tracking instances
- `GenerativeBenchmarkerOutput`: Pluggable output system with multiple
format support

### Breaking Changes:

-
Removed `BenchmarkAggregator` and `GenerativeBenchmarkAggregator` classes
- Restructured benchmark object hierarchy and field names
- Modified `Benchmarker.run()` method signature and return type
- Updated progress tracking interfaces and event handling

## **Test Plan**

- Tests to be added in a subsequent PR

## **Related Issues**

- Part of the larger scheduler refactor initiative

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## **Use of AI**

- [x]  Includes AI-assisted code completion
- [x]  Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
## **Summary**

Introduces a comprehensive mock server implementation that simulates
OpenAI and vLLM APIs with configurable timing characteristics and
response patterns. The mock server enables realistic performance testing
and validation of GuideLLM benchmarking workflows without requiring
actual model deployments, supporting both streaming and non-streaming
endpoints with proper token counting, latency simulation (TTFT/ITL), and
error handling.

## **Details**

- Added `mock_server` package with modular architecture including
configuration, handlers, models, server, and utilities
- Implemented `MockServerConfig` with Pydantic settings for centralized
configuration management supporting environment variables
- Created HTTP request handlers for OpenAI-compatible endpoints:
- `ChatCompletionsHandler` for `/v1/chat/completions` with streaming
support
    - `CompletionsHandler` for `/v1/completions` legacy endpoint
- `TokenizerHandler` for
vLLM-compatible `/tokenize` and `/detokenize` endpoints
- Added comprehensive Pydantic models for request/response validation
compatible with both OpenAI and vLLM API specifications
- Implemented high-performance Sanic-based server with CORS support,
middleware, and proper error handling
- Created mock tokenizer and text generation utilities with
deterministic token generation for reproducible testing
- Added timing generators for realistic latency simulation including
TTFT (Time To First Token) and ITL (Inter-Token Latency)
- Included comprehensive test suite with integration tests using real
HTTP server instances

## **Test Plan**

- Unit/integration style tests added to automation

## **Related Issues**

- Part of the larger scheduler refactor initiative

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## **Use of AI**

- [x]  Includes AI-assisted code completion
- [x]  Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

---------

Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
## Summary

This PR handles errors that occur when there are no successful requests.
There will obviously still be an error, but it will be one that the user
can get useful information from, rather than one that is the inner
workings breaking.

## Details

- Adds default value for an inner data type to allow it to work in this
edge case.
- Adds an error check that creates a runtime error with an explanation
for the failure. The error message can be changed if you would like the
wording changed.
- Fixes a type literal mismatch.

## Test Plan

- Run GuideLLM against a mock server in a way that results in all
requests failing. Like setting the max token value way too small.

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

---------

Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
… off of til merged into refactor base) (#358)

## **Summary**

Refactor of the GuideLLM command-line interface, streamlining the
benchmark command structure while adding new mock server functionality
and performance optimization features and adding in any missing fixes in
other PRs to stabilize the refactor to a working state.

## **Details**

- **CLI Interface Overhaul**:
- Removed legacy `-scenario` option in favor of direct parameter
specification
- Reorganized CLI options with clear grouping (Backend, Data, Output,
Aggregators, Constraints)
- Added parameter aliases for backward compatibility
(e.g., `-rate-type` → `-profile`)
    - Simplified option defaults by removing scenario-based defaults
- Added comprehensive docstrings and help text for all commands and
options
- **New Mock Server Command**:
- Added guidellm mock-server command with full OpenAI/vLLM API
compatibility
- Configurable latency characteristics (request latency, TTFT, ITL,
output tokens)
    - Support for both streaming and non-streaming endpoints
- Comprehensive server configuration options (host, port, workers, model
name)
- **Performance Optimization Features**:
- Added new `perf` optional dependency group
with `orjson`, `msgpack`, `msgspec`, uvloop
    - Integrated uvloop for enhanced async performance when available
    - Optimized event loop policy selection based on availability
- **Internal Architecture Improvements**:
- Updated import paths
(guidellm.backend → guidellm.backends, guidellm.scheduler.strategy → guidellm.scheduler)
- Replaced scenario-based benchmarking with
direct benchmark_generative_text function calls
    - Enhanced error handling and parameter validation
    - Simplified logging format for better readability
- **Enhanced Output and Configuration**:
- Added support for multiple output formats
with `-output-formats` option
    - Improved output path handling for files vs directories
- Added new constraint options
(`-max-errors`, `-max-error-rate`, `-max-global-error-rate`)
- Enhanced warmup/cooldown specification with flexible
numeric/percentage options
- **Code Quality Improvements**:
    - Comprehensive type annotations throughout the codebase
    - Detailed docstrings following Google/NumPy style conventions
    - Consistent parameter naming and organization
    - Removed deprecated version option from main CLI group

## **Test Plan**

- Tests for entrypoints to be added later

## **Related Issues**

- Part of the larger scheduler refactor initiative

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## **Use of AI**

- [x]  Includes AI-assisted code completion
- [x]  Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
@sjmonson sjmonson force-pushed the features/refactor/base-draft branch from 4c4ea5d to aa81de8 Compare September 30, 2025 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants