-
Notifications
You must be signed in to change notification settings - Fork 89
LLM Request Source for LLM-based load testing #1503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
eric846
merged 29 commits into
envoyproxy:main
from
Grayson-LaFleur-Google:llm_req_source
Mar 9, 2026
Merged
Changes from all commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
9249b7c
create LLM request source
Grayson-LaFleur-Google 31517e1
add untracked files
Grayson-LaFleur-Google 00f0446
fix build issues, add to general build rule, and add documentation
Grayson-LaFleur-Google f47130e
Merge remote-tracking branch 'upstream/main' into llm_req_source
Grayson-LaFleur-Google aad5dad
update documentation
Grayson-LaFleur-Google 07ef012
update documentation
Grayson-LaFleur-Google 494be94
clang-format
Grayson-LaFleur-Google b7ee4bb
format fix
Grayson-LaFleur-Google d773ce1
format fix
Grayson-LaFleur-Google fce11ac
fix namespace
Grayson-LaFleur-Google ef7f06d
remove bashrc accidental change
Grayson-LaFleur-Google be948e0
update with comments
Grayson-LaFleur-Google c88b31d
fix format
Grayson-LaFleur-Google c9e6a1b
update documentation structure
Grayson-LaFleur-Google 5456fe2
add test cases for llm request source
Grayson-LaFleur-Google e619699
format fix
Grayson-LaFleur-Google 845bf77
add direct dependency
Grayson-LaFleur-Google 605f98b
nits
Grayson-LaFleur-Google 4d32939
format fix
Grayson-LaFleur-Google 6a5156d
fix nit and add strappend support
Grayson-LaFleur-Google 470cd4d
tab format fix
Grayson-LaFleur-Google f66f634
tab format fix
Grayson-LaFleur-Google c068d79
format fix
Grayson-LaFleur-Google 52e164e
format
Grayson-LaFleur-Google 39f7a1c
format
Grayson-LaFleur-Google dc53375
format
Grayson-LaFleur-Google bce99d6
BUILD issue fix
Grayson-LaFleur-Google 1940ea4
fix include
Grayson-LaFleur-Google 34c6a28
fix format
Grayson-LaFleur-Google File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| # Load Testing with LLM-formatted Requests | ||
|
|
||
| ## Overview | ||
|
|
||
| If you would like to perform a load test against an LLM backend, using | ||
| the [Completions API spec](https://developers.openai.com/api/docs/guides/completions/), | ||
| there is an LLM Request Source plugin that can emulate that workload. | ||
| These request bodies looks like the following: | ||
|
|
||
| ``` | ||
| { | ||
| "model": "Qwen/Qwen2.5-1.5B-Instruct", | ||
| "max_tokens": 10, | ||
| "messages": [ | ||
| { | ||
| "role": "user", | ||
| "content": "L Q 5 i x D q v p X" | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| This is generated based on input you provide. The 4 inputs are: | ||
|
|
||
| 1. Model Name (required) | ||
| - Name of the LLM model the requests are being sent to | ||
| 2. Request Token Count (default 0) | ||
| - Number of "tokens" generated for the request | ||
| 3. Response Max Token Count (default 0) | ||
| - Maximum number of tokens for the model to respond with | ||
| 4. [Request Options List](https://github.com/envoyproxy/nighthawk/blob/09d64d769972513989a95766a98e28f5d6bb05c2/api/client/options.proto#L32) (optional) | ||
| - This allows you to add headers and choose request method of the requests | ||
Grayson-LaFleur-Google marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| A few additional details about the request options list: | ||
|
|
||
| 1. Header 'Content-Type: application/json' added by default | ||
| 2. Ignore the "request_body_size" and "json_body" in this field | ||
| 3. If a host name is required, use ":authority" header instead of ":host" | ||
|
|
||
| The config for running with this request source is passed into the "--request-source-plugin-config" flag. | ||
| Here is an example of how that flag might look for running a load test with this LLM Request Source plugin: | ||
| ``` | ||
| --request-source-plugin-config "{name:\"nighthawk.request_source.llm\",typed_config:{\"@type\":\"type.googleapis.com/nighthawk.LlmRequestSourcePluginConfig\", model_name: \"Qwen/Qwen2.5-1.5B-Instruct\", req_token_count: 10, resp_max_tokens: 10, options_list:{options:[{request_headers:[{header:{key:\":authority\",value:\"team1.example.com\"}}]}]}}}" | ||
| ``` | ||
|
|
||
| Please be conscious about using escape characters in your string. | ||
|
|
||
| ## Tokenizer | ||
|
|
||
| We do not use a real tokenizer for generating tokens in the requests. Instead, | ||
| we do a naive "tokenizer" where each "token" is just a random character in the | ||
| range of [A-Za-z0-9] with a space between each. This means that the length of | ||
| the requested message will always be 2*req_token_count-1. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| syntax = "proto3"; | ||
|
|
||
| package nighthawk; | ||
|
|
||
| import "api/client/options.proto"; | ||
|
|
||
| // Config for `LlmRequestSourcePlugin`. | ||
| message LlmRequestSourcePluginConfig { | ||
| // Model to use for the request. This field is required. | ||
| string model_name = 1; | ||
|
|
||
| // Number of tokens to generate in the request. Defaults to 0. | ||
| int32 req_token_count = 2; | ||
|
|
||
| // Maximum number of tokens to return in the response. Defaults to 0. | ||
| int32 resp_max_tokens = 3; | ||
|
|
||
| // The options_list will be used to apply headers to the request. | ||
| nighthawk.client.RequestOptionsList options_list = 4; | ||
| } |
125 changes: 125 additions & 0 deletions
125
source/request_source/llm_request_source_plugin_impl.cc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,125 @@ | ||
| #include "source/request_source/llm_request_source_plugin_impl.h" | ||
|
|
||
| #include <memory> | ||
| #include <string> | ||
| #include <utility> | ||
|
|
||
| #include "absl/log/check.h" | ||
| #include "absl/random/random.h" | ||
| #include "absl/status/status.h" | ||
| #include "absl/status/statusor.h" | ||
| #include "absl/strings/str_cat.h" | ||
| #include "absl/strings/str_format.h" | ||
|
|
||
| #include "source/request_source/llm_request_source_plugin.pb.h" | ||
|
|
||
| #include "envoy/api/api.h" | ||
| #include "envoy/config/core/v3/base.pb.h" | ||
| #include "envoy/config/core/v3/extension.pb.h" | ||
| #include "envoy/http/header_map.h" | ||
| #include "envoy/registry/registry.h" | ||
| #include "external/envoy/source/common/http/header_map_impl.h" | ||
| #include "external/envoy/source/common/protobuf/protobuf.h" | ||
| #include "external/envoy/source/common/protobuf/utility.h" | ||
|
|
||
| #include "api/client/options.pb.h" | ||
| #include "api/request_source/request_source_plugin.pb.h" | ||
| #include "nighthawk/common/request.h" | ||
| #include "nighthawk/common/request_source.h" | ||
| #include "nighthawk/request_source/request_source_plugin_config_factory.h" | ||
| #include "source/common/request_impl.h" | ||
|
|
||
| namespace Nighthawk { | ||
| namespace { | ||
|
|
||
| absl::Status ValidateConfig(const nighthawk::LlmRequestSourcePluginConfig& config) { | ||
| if (config.model_name().empty()) { | ||
| return absl::InvalidArgumentError("Model name is required."); | ||
| } | ||
|
|
||
| return absl::OkStatus(); | ||
| } | ||
|
|
||
| constexpr absl::string_view kCharset = "0123456789" | ||
| "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | ||
| "abcdefghijklmnopqrstuvwxyz"; | ||
|
|
||
| std::string GenerateRandomPrompt(int num_tokens) { | ||
| std::string result_string; | ||
| absl::BitGen bitgen; | ||
|
|
||
| for (int i = 0; i < num_tokens; ++i) { | ||
| // Append a random character from the charset. | ||
| absl::StrAppend(&result_string, | ||
| std::string(1, kCharset[absl::Uniform<size_t>(bitgen, 0, kCharset.length())])); | ||
|
|
||
| // Add a space between tokens. This is a naive way to calculate the number | ||
| // of tokens in the string as generally spaces delineate tokens. | ||
| if (i < num_tokens - 1) { | ||
| absl::StrAppend(&result_string, " "); | ||
| } | ||
| } | ||
|
|
||
| return result_string; | ||
| } | ||
|
|
||
| } // namespace | ||
|
|
||
| Nighthawk::RequestGenerator LlmRequestSourcePlugin::get() { | ||
| return [this]() -> std::unique_ptr<Nighthawk::Request> { | ||
| Envoy::Http::RequestHeaderMapPtr headers = Envoy::Http::RequestHeaderMapImpl::create(); | ||
| Envoy::Http::HeaderMapImpl::copyFrom(*headers, *header_); | ||
|
|
||
| std::string body = | ||
| absl::StrFormat(R"json( | ||
| { | ||
| "model": "%s", | ||
| "max_tokens": %d, | ||
| "messages": [ | ||
| { | ||
| "role": "user", | ||
| "content": "%s" | ||
| } | ||
| ] | ||
| } | ||
| )json", | ||
| model_name_, resp_max_tokens_, GenerateRandomPrompt(req_token_count_)); | ||
|
|
||
| headers->setMethod( | ||
| envoy::config::core::v3::RequestMethod_Name(envoy::config::core::v3::RequestMethod::POST)); | ||
| headers->setContentType("application/json"); | ||
| headers->setContentLength(body.size()); | ||
|
|
||
| auto path_key = Envoy::Http::LowerCaseString(":path"); | ||
| headers->setCopy(path_key, "/v1/completions"); | ||
|
|
||
| return std::make_unique<Nighthawk::RequestImpl>(std::move(headers), body); | ||
| }; | ||
| } | ||
|
|
||
| Nighthawk::RequestSourcePtr | ||
| LlmRequestSourcePluginFactory::createRequestSourcePlugin(const Envoy::Protobuf::Message& message, | ||
| Envoy::Api::Api&, | ||
| Envoy::Http::RequestHeaderMapPtr header) { | ||
| const auto* any = Envoy::Protobuf::DynamicCastToGenerated<const Envoy::Protobuf::Any>(&message); | ||
| nighthawk::LlmRequestSourcePluginConfig llm_config; | ||
| THROW_IF_NOT_OK(Envoy::MessageUtil::unpackTo(*any, llm_config)); | ||
| THROW_IF_NOT_OK(ValidateConfig(llm_config)); | ||
|
|
||
| for (const nighthawk::client::RequestOptions& request_option : | ||
| llm_config.options_list().options()) { | ||
| for (const envoy::config::core::v3::HeaderValueOption& option_header : | ||
| request_option.request_headers()) { | ||
| auto lower_case_key = Envoy::Http::LowerCaseString(option_header.header().key()); | ||
| header->setCopy(lower_case_key, option_header.header().value()); | ||
| } | ||
| } | ||
|
|
||
| return std::make_unique<LlmRequestSourcePlugin>(std::string(llm_config.model_name()), | ||
| llm_config.req_token_count(), | ||
| llm_config.resp_max_tokens(), std::move(header)); | ||
| }; | ||
|
|
||
| REGISTER_FACTORY(LlmRequestSourcePluginFactory, Nighthawk::RequestSourcePluginConfigFactory); | ||
|
|
||
| } // namespace Nighthawk |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| #pragma once | ||
|
|
||
| #include <memory> | ||
| #include <string> | ||
| #include <utility> | ||
|
|
||
| #include "source/request_source/llm_request_source_plugin.pb.h" | ||
|
|
||
| #include "absl/log/log.h" | ||
| #include "absl/status/statusor.h" | ||
| #include "absl/strings/string_view.h" | ||
|
|
||
| #include "envoy/api/api.h" | ||
| #include "envoy/config/core/v3/extension.pb.h" | ||
| #include "envoy/http/header_map.h" | ||
| #include "external/envoy/source/common/common/logger.h" | ||
| #include "external/envoy/source/common/protobuf/protobuf.h" | ||
|
|
||
| #include "api/client/options.pb.h" | ||
| #include "nighthawk/common/request_source.h" | ||
| #include "nighthawk/request_source/request_source_plugin_config_factory.h" | ||
|
|
||
| namespace Nighthawk { | ||
|
|
||
| constexpr inline absl::string_view kLlmRequestSourcePluginName = "nighthawk.request_source.llm"; | ||
|
|
||
| // A Nighthawk RequestSource that generates completions API requests. | ||
| // | ||
| // The request source generates requests with the following characteristics: | ||
| // - The request body is a JSON object with the following fields: | ||
| // - model: The name of the model to use for inference. | ||
| // - max_tokens: The maximum number of tokens to return in the response. | ||
| // - messages: A list with a single JSON object containing the following | ||
| // fields: | ||
| // - role: "user" | ||
| // - content: A string containing `req_token_count` randomly generated | ||
| // tokens. | ||
| // - The request headers are copied from the provided header map with the | ||
| // following modifications: | ||
| // - Method: POST | ||
| // - Content-Type: application/json | ||
| // - Content-Length: The length of the request body. | ||
| // - :path: /v1/completions | ||
| class LlmRequestSourcePlugin : public Nighthawk::RequestSource, | ||
| public Envoy::Logger::Loggable<Envoy::Logger::Id::http> { | ||
| public: | ||
| explicit LlmRequestSourcePlugin(std::string model_name, int req_token_count, int resp_max_tokens, | ||
| Envoy::Http::RequestHeaderMapPtr header) | ||
| : model_name_(model_name), req_token_count_(req_token_count), | ||
| resp_max_tokens_(resp_max_tokens), header_(std::move(header)) {}; | ||
|
|
||
| Nighthawk::RequestGenerator get() override; | ||
| void initOnThread() override {}; | ||
| void destroyOnThread() override {}; | ||
|
|
||
| private: | ||
| // Model to use for the request. | ||
| std::string model_name_; | ||
| // Number of tokens to generate in the request. | ||
| int req_token_count_; | ||
| // Maximum number of tokens from the model to return in the response. | ||
| int resp_max_tokens_; | ||
| // The options_list will be used to apply headers to the request. | ||
| std::unique_ptr<const nighthawk::client::RequestOptionsList> options_list_; | ||
| // Headers for the request. | ||
| Envoy::Http::RequestHeaderMapPtr header_; | ||
Grayson-LaFleur-Google marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| }; | ||
|
|
||
Grayson-LaFleur-Google marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| // Factory class for creating LlmRequestSourcePlugin objects. | ||
| class LlmRequestSourcePluginFactory : public virtual Nighthawk::RequestSourcePluginConfigFactory { | ||
| public: | ||
| std::string name() const override { return std::string(kLlmRequestSourcePluginName); } | ||
|
|
||
| Envoy::ProtobufTypes::MessagePtr createEmptyConfigProto() override { | ||
| return std::make_unique<nighthawk::LlmRequestSourcePluginConfig>(); | ||
| } | ||
|
|
||
| Nighthawk::RequestSourcePtr | ||
| createRequestSourcePlugin(const Envoy::Protobuf::Message&, Envoy::Api::Api&, | ||
| Envoy::Http::RequestHeaderMapPtr header) override; | ||
| }; | ||
|
|
||
| } // namespace Nighthawk | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.