-
Notifications
You must be signed in to change notification settings - Fork 85
[DO NOT MERGE] GuideLLM Refactor Draft #365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
sjmonson
wants to merge
47
commits into
main
Choose a base branch
from
features/refactor/base-draft
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… refactor branch Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
… refactor branch Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
…ng config.py to settings.py due to later config additions and potential conflicts in naming Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
…view Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
… for plural Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
…p to avoid conflicts Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Signed-off-by: jaredoconnell <joconnel@redhat.com>
## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> TODO --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
## Summary This PR ports the new functionality from `benchmark run` to `benchmark from-file`, and does so in a way that reuses as much code as practical to have one source of truth. ## Details <!-- Provide a detailed list of all changes introduced in this pull request. --> - Fixes from-file by making it to use the new output format. - Moves code related to the new output formats to separate functions that are called from both benchmark entrypoints. - Moves additional chunks of code out of the large benchmark run entrypoint function for modularity. ## Test Plan Run a benchmark with an output of json or yaml, and use `from-file` to re-import it and export it. You can select any output type supported by `benchmark run`. `guidellm benchmark from-file ./result.json --output-formats console` `guidellm benchmark from-file ./result.yaml --output-formats yaml` ## Related Issues --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [x] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) --------- Signed-off-by: Jared O'Connell <joconnel@redhat.com>
## Summary Reintroduces a few changes from main --------- Signed-off-by: Samuel Monson <smonson@redhat.com>
4549a21
to
4c4ea5d
Compare
Replace scenario entrypoint with a decorator Forward-port get_default and from_file to Scenario Apply scenario args as an update to kwargs Readd scenario support to CLI Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
…352) ## **Summary** Renames `config.py` to `settings.py` for better semantic clarity particularly with later config pathways and updates project dependencies. ## **Details** - **Module Refactor**: Renamed `src/guidellm/config.py` to `settings.;py`and updated all import references across the codebase - **Dependency Updates**: - Added new dependencies: `culsans`, `eval_type_backport`, `faker`, `msgpack`, `pyhumps`, `sanic`, `uvloop` - Updated `pytest-asyncio` from `~=0.23.8` to `~=1.1.0` - Removed `recommended` optional dependencies section - Removed `[dependency-groups]` and `[tool.pdm]` sections - **Configuration Improvements**: - Reformatted license specification to use `{text = "Apache-2.0"}` format - Added `target-version = "py39"` to Ruff configuration - Improved code formatting and comment alignment in pyproject.toml - Enhanced Ruff ignore rules with better documentation - Fixed trailing comma in pytest markers - **Import Organization**: Updated import order in `__init__.py` to import logger before settings for better dependency flow ## **Test Plan** - Automated tests are passing ## **Related Issues** N/A --- - [x] "I certify that all code in this PR is my own, except as noted below." ## **Use of AI** - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
…nsion needed for the other, refactored packages (#353) ## **Summary** Introduces a comprehensive utilities infrastructure to support distributed processing, inter-process communication, and statistical analysis for the GuideLLM framework. The changes include new modules for encoding/serialization, messaging systems, statistical computations, and various utility mixins while removing deprecated functionality and improving code organization. ## **Details** - **Added messaging infrastructure** (`messaging.py`): Inter-process communication abstractions supporting queue-based, pipe-based, and manager-based messaging with configurable encoding and serialization - **Added encoding utilities** (`encoding.py`): High-performance message encoding/decoding with Pydantic model support, configurable serialization strategies (dict/sequence), and binary encoding (msgpack/msgspec) - **Added statistical analysis** (`statistics.py`): Comprehensive statistical computation tools including distribution summaries, percentiles, running statistics, and specialized request timing analysis - **Added registry system** (`registry.py`): Dynamic object registration and discovery with auto-discovery capabilities for extensible plugin architectures - **Added Pydantic utilities** (`pydantic_utils.py`): Polymorphic model serialization, registry integration, and standardized base model classes - **Added console utilities** (`console.py`): Rich console integration with status tracking, colored output, and progress indicators - **Added synchronization utilities** (`synchronous.py`): Async-compatible wrappers for threading/multiprocessing synchronization primitives - **Added singleton patterns** (`singleton.py`): Thread-safe and basic singleton implementations for resource management - **Added utility functions** (`functions.py`): Safe arithmetic operations, timestamp formatting, and defensive programming utilities - **Added mixin classes** (`mixins.py`): Reusable mixins for metadata extraction and object introspection - **Added auto-importer** (`auto_importer.py`): Automatic module importing for dynamic class discovery - **Enhanced text utilities**: Added `format_value_display()` function for consistent metric formatting and improved documentation - **Removed deprecated code**: Deleted `dict.py` module with `recursive_key_update()` and `camelize_str()` function from `text.py` - **Updated imports**: Comprehensive reorganization of `__init__.py` exports to reflect new utilities structure ## **Test Plan** - Full unit tests added and passing ## **Related Issues** This refactor supports the broader scheduler infrastructure improvements and distributed processing capabilities. --- - [x] "I certify that all code in this PR is my own, except as noted below." ## **Use of AI** - [x] Includes AI-assisted code completion - [x] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
…pansion (#354) ## **Summary** Introduces a comprehensive constraints system and enhanced timing control for the scheduler refactor. The implementation moves from hardcoded execution limits to a flexible, composable constraint system that enables sophisticated benchmark stopping criteria. Additionally, request timing calculations are moved from precalculated to per-request basis, enabling dynamic rate adjustments and better distributed coordination. ## **Details** - **Added constraints system** (`constraints.py`): Implements Protocol-based constraint architecture with support for request limits, duration limits, error thresholds, and sliding window error rates - `MaxNumberConstraint`: Limits execution based on request count - `MaxDurationConstraint`: Limits execution based on time duration - `MaxErrorsConstraint`: Limits execution based on absolute error count - `MaxErrorRateConstraint`: Limits execution based on sliding window error rate - `MaxGlobalErrorRateConstraint`: Limits execution based on global error rate - `ConstraintsInitializerFactory`: Registry system for constraint creation and serialization - **Refactored core objects** (`objects.py`): Replaced `result.py` and expanded capabilities - Made scheduler package fully generic, decoupling from backend-specific types - Added `BackendInterface` protocol for type-safe backend integration - Enhanced `ScheduledRequestInfo` with comprehensive timing and status tracking - Added `SchedulerState` for distributed state coordination - Introduced `SchedulerUpdateAction` for constraint-based control signals - **Enhanced scheduling strategies** (`strategy.py`): Introduced request timing abstractions - Added `ScheduledRequestTimings` base class for timing implementations - `LastCompletionRequestTimings`: For synchronous and concurrent strategies - `NoDelayRequestTimings`: For maximum throughput strategies - `ConstantRateRequestTimings`: For fixed-rate scheduling - `PoissonRateRequestTimings`: For stochastic request patterns - Strategies now create per-worker timing instances instead of precalculated schedules - **Added environment abstractions** (`environment.py`): Coordination layer for distributed execution - `Environment` protocol for distributed synchronization - `NonDistributedEnvironment` implementation for single-node execution - **Worker process management** (`worker.py`, `worker_group.py`): Distributed request processing infrastructure - Individual worker process management with lifecycle coordination - Multi-process orchestration with state synchronization - Constraint evaluation and graceful shutdown coordination ## **Test Plan** - Full unit tests and some integration tests added and passing ## **Related Issues** - Part of scheduler refactor initiative to support distributed benchmarking --- - [x] "I certify that all code in this PR is my own, except as noted below." ## **Use of AI** - [x] Includes AI-assisted code completion - [x] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
…nsion (#355) ## **Summary** Refactors the backend package to introduce a new architecture that supports the scheduler refactor. The changes include new object models for requests/responses, adjustments to the registry-based backend system, and integration with the scheduler interface. The refactor replaces the previous response/streaming model with a more flexible request-response pattern that supports timing measurements and distributed execution. ## **Details** - **New object models**: Introduced `GenerationRequest`, `GenerationResponse`, and `GenerationRequestTimings` in `objects.py` to standardize data flow between scheduler and backends - **Backend architecture refactor**: Redesigned `Backend` base class to extend `RegistryMixin` and `BackendInterface`, enabling automatic registration and scheduler integration - **Process lifecycle management**: Added `process_startup()` and `process_shutdown()` methods to support distributed worker processes with proper resource management - **OpenAI backend modernization**: Rewrote `OpenAIHTTPBackend` with improved error handling, streaming support, beginning support for multimodal content processing, and configuration management - **Registry system**: Implemented automatic backend registration using decorators, replacing manual registry management - **Scheduler integration**: Backends now implement `BackendInterface` with `resolve()` method for processing generation requests with timing metadata - **Removed deprecated modules**: Eliminated `response.py` with its streaming response types in favor of the new object model - **Enhanced test coverage**: Added comprehensive unit tests for all new components with smoke, sanity, and regression test categories ## **Test Plan** - Execute new backend module test suites covering smoke, sanity, and regression scenarios - Verify OpenAI backend functionality with mocked HTTP responses for both streaming and non-streaming modes - Test backend registry and factory pattern functionality - Validate integration with scheduler interfaces and timing measurement - Ensure proper resource management during process startup/shutdown cycles ## **Related Issues** - Part of the larger scheduler refactor initiative --- - [x] "I certify that all code in this PR is my own, except as noted below." ## **Use of AI** - [x] Includes AI-assisted code completion - [x] Includes code generated by an AI application - [] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
## **Summary** Introduces a comprehensive refactor of the benchmarking system, replacing the previous architecture with a more flexible and extensible design. The changes include new aggregation protocols, enhanced benchmark objects with comprehensive metrics, and improved progress tracking capabilities. This refactor enables better separation of concerns, more granular metric collection, and improved real-time monitoring of benchmark execution. ## **Details** - **New Aggregation System**: Replaced `BenchmarkAggregator` with protocol-based `Aggregator` and `CompilableAggregator` interfaces, enabling composable metric collection and compilation - **Enhanced Benchmark Objects**: Refactored benchmark data models in `objects.py` with comprehensive metrics including timing distributions, token statistics, and performance measurements - **Improved Benchmarker**: Redesigned `Benchmarker` class to coordinate request scheduling, data aggregation, and result compilation with thread-safe singleton pattern - **Flexible Output System**: Added pluggable output formatters supporting console, CSV, HTML, and JSON formats with configurable file paths - **Advanced Progress Tracking**: Implemented composite progress handlers with real-time console display showing detailed metrics, timing information, and progress bars - **Profile System Enhancements**: Enhanced profile configurations with better strategy generation, constraint management, and completion tracking - **Comprehensive Entrypoints**: Redesigned `benchmark_generative_text` function with improved configuration options, validation, and error handling ### Key Components Added: - `SchedulerStatsAggregator`: Collects scheduler timing and performance metrics - `GenerativeRequestsAggregator`: Compiles complete generative benchmark results with warmup/cooldown filtering - `GenerativeStatsProgressAggregator`: Tracks real-time generation metrics during execution - `BenchmarkerProgressGroup`: Composite progress handler for multiple tracking instances - `GenerativeBenchmarkerOutput`: Pluggable output system with multiple format support ### Breaking Changes: - Removed `BenchmarkAggregator` and `GenerativeBenchmarkAggregator` classes - Restructured benchmark object hierarchy and field names - Modified `Benchmarker.run()` method signature and return type - Updated progress tracking interfaces and event handling ## **Test Plan** - Tests to be added in a subsequent PR ## **Related Issues** - Part of the larger scheduler refactor initiative --- - [x] "I certify that all code in this PR is my own, except as noted below." ## **Use of AI** - [x] Includes AI-assisted code completion - [x] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
## **Summary** Introduces a comprehensive mock server implementation that simulates OpenAI and vLLM APIs with configurable timing characteristics and response patterns. The mock server enables realistic performance testing and validation of GuideLLM benchmarking workflows without requiring actual model deployments, supporting both streaming and non-streaming endpoints with proper token counting, latency simulation (TTFT/ITL), and error handling. ## **Details** - Added `mock_server` package with modular architecture including configuration, handlers, models, server, and utilities - Implemented `MockServerConfig` with Pydantic settings for centralized configuration management supporting environment variables - Created HTTP request handlers for OpenAI-compatible endpoints: - `ChatCompletionsHandler` for `/v1/chat/completions` with streaming support - `CompletionsHandler` for `/v1/completions` legacy endpoint - `TokenizerHandler` for vLLM-compatible `/tokenize` and `/detokenize` endpoints - Added comprehensive Pydantic models for request/response validation compatible with both OpenAI and vLLM API specifications - Implemented high-performance Sanic-based server with CORS support, middleware, and proper error handling - Created mock tokenizer and text generation utilities with deterministic token generation for reproducible testing - Added timing generators for realistic latency simulation including TTFT (Time To First Token) and ITL (Inter-Token Latency) - Included comprehensive test suite with integration tests using real HTTP server instances ## **Test Plan** - Unit/integration style tests added to automation ## **Related Issues** - Part of the larger scheduler refactor initiative --- - [x] "I certify that all code in this PR is my own, except as noted below." ## **Use of AI** - [x] Includes AI-assisted code completion - [x] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) --------- Signed-off-by: Mark Kurtz <mark.j.kurtz@gmail.com>
## Summary This PR handles errors that occur when there are no successful requests. There will obviously still be an error, but it will be one that the user can get useful information from, rather than one that is the inner workings breaking. ## Details - Adds default value for an inner data type to allow it to work in this edge case. - Adds an error check that creates a runtime error with an explanation for the failure. The error message can be changed if you would like the wording changed. - Fixes a type literal mismatch. ## Test Plan - Run GuideLLM against a mock server in a way that results in all requests failing. Like setting the max token value way too small. --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) --------- Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
… off of til merged into refactor base) (#358) ## **Summary** Refactor of the GuideLLM command-line interface, streamlining the benchmark command structure while adding new mock server functionality and performance optimization features and adding in any missing fixes in other PRs to stabilize the refactor to a working state. ## **Details** - **CLI Interface Overhaul**: - Removed legacy `-scenario` option in favor of direct parameter specification - Reorganized CLI options with clear grouping (Backend, Data, Output, Aggregators, Constraints) - Added parameter aliases for backward compatibility (e.g., `-rate-type` → `-profile`) - Simplified option defaults by removing scenario-based defaults - Added comprehensive docstrings and help text for all commands and options - **New Mock Server Command**: - Added guidellm mock-server command with full OpenAI/vLLM API compatibility - Configurable latency characteristics (request latency, TTFT, ITL, output tokens) - Support for both streaming and non-streaming endpoints - Comprehensive server configuration options (host, port, workers, model name) - **Performance Optimization Features**: - Added new `perf` optional dependency group with `orjson`, `msgpack`, `msgspec`, uvloop - Integrated uvloop for enhanced async performance when available - Optimized event loop policy selection based on availability - **Internal Architecture Improvements**: - Updated import paths (guidellm.backend → guidellm.backends, guidellm.scheduler.strategy → guidellm.scheduler) - Replaced scenario-based benchmarking with direct benchmark_generative_text function calls - Enhanced error handling and parameter validation - Simplified logging format for better readability - **Enhanced Output and Configuration**: - Added support for multiple output formats with `-output-formats` option - Improved output path handling for files vs directories - Added new constraint options (`-max-errors`, `-max-error-rate`, `-max-global-error-rate`) - Enhanced warmup/cooldown specification with flexible numeric/percentage options - **Code Quality Improvements**: - Comprehensive type annotations throughout the codebase - Detailed docstrings following Google/NumPy style conventions - Consistent parameter naming and organization - Removed deprecated version option from main CLI group ## **Test Plan** - Tests for entrypoints to be added later ## **Related Issues** - Part of the larger scheduler refactor initiative --- - [x] "I certify that all code in this PR is my own, except as noted below." ## **Use of AI** - [x] Includes AI-assisted code completion - [x] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
features/refactor/scenarios
features/refactor-fix-html
features/refactor/fix-csv
4c4ea5d
to
aa81de8
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Branch with all refactor changes merged in for testing