[GuideLLM Refactor] Advanced Prefix Cache Controls #382

sjmonson · 2025-09-30T17:59:03Z

TODO

Docs
~~CSV arg string support~~ CSV arg string now supports single bucket (see last example). Might leave it at that for now.
More validation

Summary

This PR is a port of #287 to the v0.4.0 refactor branch.

Adds controls for sharing one or more fixed prefixes between samples. See examples bellow.

Details

Adds a prefix_buckets argument to the SyntheticTextDatasetConfig, each bucket consists of a prefix count, token count, and bucket weight. Prefix count sets the number of unique prefixes to generate for a given bucket, token count is the length of each prompt in the bucket, and bucket weight is used to calculate the proportion of requests the bucket applies to relative to the sum of all bucket weights. Here are a few examples:

Here we have one bucket of 32 prefixes of length 2048. Since there are 1024 total samples each prefix will apply to 32 samples. If there is only one bucket than weight can be omitted as the bucket applies to 100% of samples.

data:
  prefix_buckets:
    - prefix_tokens: 2048
      prefix_count: 32
  prompt_tokens: 256
  output_tokens: 256
  samples: 1024

In this modified version of the first example 16 of the prompts have 2048 tokens while the other 16 have 1024 tokens.

data:
  prefix_buckets:
    - prefix_tokens: 2048
      prefix_count: 16
      bucket_weight: 50
    - prefix_tokens: 1024
      prefix_count: 16
      bucket_weight: 50
  prompt_tokens: 256
  output_tokens: 256
  samples: 1024

The prefix tokens of a bucket can also be 0 to disable prefixes for those samples. Here is an example where 40% of the samples have a prefix of 2048 tokens while the other 60% have no prefix.

data:
  prefix_buckets:
    - prefix_tokens: 2048
      bucket_weight: 40
    - prefix_tokens: 0
      bucket_weight: 60
  prompt_tokens: 256
  output_tokens: 256
  samples: 1000

If only a single bucket is needed, it can be set at the top level. This make the changes backwards compatible with the previous interface and allows the CSV string format to work without parsing nested structures (at least for this use-case).

data:
  prefix_tokens: 128
  prefix_count: 10
  prompt_tokens: 256
  output_tokens: 256
  samples: 1000

Test Plan

PR includes unit tests for all synthetic dataset changes (pytest tests/unit/dataset)
Scenearios in the Details section can be used against a model server with prefix caching and the cache rate can be confirmed by inspecting console output.

Related Issues

"I certify that all code in this PR is my own, except as noted below."

Use of AI

Includes AI-assisted code completion
Includes code generated by an AI application
Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

src/guidellm/dataset/synthetic.py

Signed-off-by: Samuel Monson <smonson@redhat.com>

Copilot

Pull Request Overview

This PR introduces advanced prefix cache controls for synthetic dataset generation, allowing users to configure shared prefixes across samples to optimize model caching behavior. The changes enable specification of multiple prefix buckets with configurable weights, prefix counts, and token lengths.

Key changes:

Added SyntheticTextPrefixBucketConfig for granular prefix control with bucket weights, prefix counts, and token lengths
Enhanced SyntheticTextDatasetConfig with prefix_buckets field and backward compatibility for single prefix configurations
Updated data generation logic to support prefix distribution across samples with proper weighting

Reviewed Changes

Copilot reviewed 8 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/unit/dataset/test_synthetic.py	Removed old synthetic dataset tests (entire file deleted)
tests/unit/data/deserializers/test_synthetic.py	Added comprehensive test suite for new synthetic dataset implementation
src/guidellm/data/utils.py	Added prefix column mapping support
src/guidellm/data/objects.py	Added prefix_column to data model types
src/guidellm/data/formatters/templates.py	Updated Jinja templates to handle prefix columns in request formatting
src/guidellm/data/deserializers/synthetic.py	Implemented new prefix bucket configuration and generation logic
src/guidellm/data/deserializers/init.py	Added export for new SyntheticTextPrefixBucketConfig class
pyproject.toml	Added torch and librosa dependencies

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-03T16:30:18Z

src/guidellm/data/deserializers/synthetic.py

+    @model_validator(mode="after")
+    def check_prefix_options(self) -> Self:
+        prefix_count = self.__pydantic_extra__.get("prefix_count", None)  # type: ignore[attr-defined]
+        prefix_tokens = self.__pydantic_extra__.get("prefix_count", None)  # type: ignore[attr-defined]


The second call to __pydantic_extra__.get() should retrieve 'prefix_tokens', not 'prefix_count'. This will always return None for prefix_tokens, breaking the backward compatibility feature.

Suggested change

prefix_tokens = self.__pydantic_extra__.get("prefix_count", None) # type: ignore[attr-defined]

prefix_tokens = self.__pydantic_extra__.get("prefix_tokens", None) # type: ignore[attr-defined]

Copilot · 2025-10-03T16:30:18Z

src/guidellm/data/formatters/templates.py

-                    if text_column and text_column|length == 1
-                    else text_column
-                )
+                "prompt": prefix_column[0]|default("") + text_column[0]


Direct string concatenation without proper spacing could result in malformed prompts. Consider adding a space or newline separator between prefix and text content to ensure proper formatting.

Suggested change

"prompt": prefix_column[0]|default("") + text_column[0]

"prompt": prefix_column[0]|default("") ~ " " ~ text_column[0]

Copilot · 2025-10-03T16:30:19Z

src/guidellm/data/deserializers/synthetic.py

+
+        # Increase weights to ensure an integer number of samples per per-prefix
+        least_common_prefix_count = math.lcm(
+            *(bucket.prefix_count for bucket in self.config.prefix_buckets)
+        )
+        unnorm_weights = [
+            least_common_prefix_count * bucket.bucket_weight // bucket.prefix_count
+            for bucket in self.config.prefix_buckets
+        ]
+        # Use GCD to reduce the weights to smallest integer ratio
+        common_divisor = math.gcd(*unnorm_weights)
+
+        # Create prefix list maintaining the correct distribution
+        prefixes = []
+        for bucket, weight in zip(self.config.prefix_buckets, unnorm_weights):
+            bucket_prefixes = [
+                self._create_prompt(bucket.prefix_tokens, faker)
+                for _ in range(bucket.prefix_count)
+            ]
+            sample_count = weight // common_divisor
+            prefixes.extend(bucket_prefixes * sample_count)
+
+        while True:
+            yield rand.choice(prefixes)
+



The function returns early when prefix_buckets is None or empty, but the remaining code is not properly indented as an else block. This creates unreachable code and potential runtime errors.

Suggested change

# Increase weights to ensure an integer number of samples per per-prefix

least_common_prefix_count = math.lcm(

*(bucket.prefix_count for bucket in self.config.prefix_buckets)

)

unnorm_weights = [

least_common_prefix_count * bucket.bucket_weight // bucket.prefix_count

for bucket in self.config.prefix_buckets

]

# Use GCD to reduce the weights to smallest integer ratio

common_divisor = math.gcd(*unnorm_weights)

# Create prefix list maintaining the correct distribution

prefixes = []

for bucket, weight in zip(self.config.prefix_buckets, unnorm_weights):

bucket_prefixes = [

self._create_prompt(bucket.prefix_tokens, faker)

for _ in range(bucket.prefix_count)

]

sample_count = weight // common_divisor

prefixes.extend(bucket_prefixes * sample_count)

while True:

yield rand.choice(prefixes)

else:

# Increase weights to ensure an integer number of samples per per-prefix

least_common_prefix_count = math.lcm(

*(bucket.prefix_count for bucket in self.config.prefix_buckets)

)

unnorm_weights = [

least_common_prefix_count * bucket.bucket_weight // bucket.prefix_count

for bucket in self.config.prefix_buckets

]

# Use GCD to reduce the weights to smallest integer ratio

common_divisor = math.gcd(*unnorm_weights)

# Create prefix list maintaining the correct distribution

prefixes = []

for bucket, weight in zip(self.config.prefix_buckets, unnorm_weights):

bucket_prefixes = [

self._create_prompt(bucket.prefix_tokens, faker)

for _ in range(bucket.prefix_count)

]

sample_count = weight // common_divisor

prefixes.extend(bucket_prefixes * sample_count)

while True:

yield rand.choice(prefixes)

Copilot · 2025-10-03T16:30:19Z

pyproject.toml

+[[tool.pdm.source]]
+name = "torch"
+type = "find_links"
+#url = "https://download.pytorch.org/whl/cpu/torch_stable.html"


Commented-out URL should be removed rather than left as a comment. If this is for documentation purposes, consider adding a proper comment explaining why this specific URL is used.

Suggested change

#url = "https://download.pytorch.org/whl/cpu/torch_stable.html"

# Previous URL for torch wheels (replaced with the current one due to updated structure)

Copilot · 2025-10-03T16:30:19Z

src/guidellm/data/deserializers/synthetic.py

+            *(bucket.prefix_count for bucket in self.config.prefix_buckets)
+        )
+        unnorm_weights = [
+            least_common_prefix_count * bucket.bucket_weight // bucket.prefix_count


This calculation could result in integer division by zero if bucket.prefix_count is 0, despite the field validation. Consider adding a runtime check or ensuring the validation prevents this case.

sjmonson changed the base branch from main to features/refactor/multiturn September 30, 2025 17:59

sjmonson commented Sep 30, 2025

View reviewed changes

src/guidellm/dataset/synthetic.py Outdated Show resolved Hide resolved

sjmonson added 5 commits October 1, 2025 17:41

Add basic prefix column

0443820

Signed-off-by: Samuel Monson <smonson@redhat.com>

Add basic prefix support back to synthetic data

7f3960b

Signed-off-by: Samuel Monson <smonson@redhat.com>

Update templates to support prefix_column

37c932d

Signed-off-by: Samuel Monson <smonson@redhat.com>

Add librosa and torch to pyproject

a635030

Signed-off-by: Samuel Monson <smonson@redhat.com>

Reimplement advanced prefix control

81ee731

Signed-off-by: Samuel Monson <smonson@redhat.com>

sjmonson force-pushed the features/refactor/adv_prefix branch from 5ddb73c to 8534ea0 Compare October 2, 2025 19:12

sjmonson changed the base branch from features/refactor/multiturn to features/refactor/multimodal_support October 2, 2025 19:13

Smarter weighing

07da84f

Signed-off-by: Samuel Monson <smonson@redhat.com>

sjmonson force-pushed the features/refactor/adv_prefix branch from 8534ea0 to 07da84f Compare October 2, 2025 19:14

Support setting initial prefix_bucket from top level config

2588cda

Signed-off-by: Samuel Monson <smonson@redhat.com>

sjmonson force-pushed the features/refactor/adv_prefix branch from 90cf0cf to 2588cda Compare October 2, 2025 20:32

sjmonson added 3 commits October 2, 2025 16:58

Port tests from old adv_prefix branch

b9fec80

Signed-off-by: Samuel Monson <smonson@redhat.com>

Move tests/unit/dataset -> tests/unit/data

59d0dad

Initial tests update

62d6749

sjmonson requested review from markurtz and Copilot October 3, 2025 16:29

Copilot AI reviewed Oct 3, 2025

View reviewed changes

markurtz approved these changes Oct 3, 2025

View reviewed changes

markurtz merged commit dd7a4b8 into features/refactor/multimodal_support Oct 3, 2025
10 of 17 checks passed

markurtz deleted the features/refactor/adv_prefix branch October 3, 2025 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GuideLLM Refactor] Advanced Prefix Cache Controls #382

[GuideLLM Refactor] Advanced Prefix Cache Controls #382

Uh oh!

sjmonson commented Sep 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 3, 2025

Uh oh!

Copilot AI Oct 3, 2025

Uh oh!

Copilot AI Oct 3, 2025

Uh oh!

Copilot AI Oct 3, 2025

Uh oh!

Copilot AI Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

	prefix_tokens = self.__pydantic_extra__.get("prefix_count", None) # type: ignore[attr-defined]
	prefix_tokens = self.__pydantic_extra__.get("prefix_tokens", None) # type: ignore[attr-defined]

	"prompt": prefix_column[0]\|default("") + text_column[0]
	"prompt": prefix_column[0]\|default("") ~ " " ~ text_column[0]

	#url = "https://download.pytorch.org/whl/cpu/torch_stable.html"
	# Previous URL for torch wheels (replaced with the current one due to updated structure)

[GuideLLM Refactor] Advanced Prefix Cache Controls #382

[GuideLLM Refactor] Advanced Prefix Cache Controls #382

Uh oh!

Conversation

sjmonson commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Summary

Details

Test Plan

Related Issues

Use of AI

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sjmonson commented Sep 30, 2025 •

edited

Loading