-
Notifications
You must be signed in to change notification settings - Fork 86
[GuideLLM Refactor] Advanced Prefix Cache Controls #382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GuideLLM Refactor] Advanced Prefix Cache Controls #382
Conversation
Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
5ddb73c
to
8534ea0
Compare
Signed-off-by: Samuel Monson <smonson@redhat.com>
8534ea0
to
07da84f
Compare
Signed-off-by: Samuel Monson <smonson@redhat.com>
90cf0cf
to
2588cda
Compare
Signed-off-by: Samuel Monson <smonson@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces advanced prefix cache controls for synthetic dataset generation, allowing users to configure shared prefixes across samples to optimize model caching behavior. The changes enable specification of multiple prefix buckets with configurable weights, prefix counts, and token lengths.
Key changes:
- Added
SyntheticTextPrefixBucketConfig
for granular prefix control with bucket weights, prefix counts, and token lengths - Enhanced
SyntheticTextDatasetConfig
withprefix_buckets
field and backward compatibility for single prefix configurations - Updated data generation logic to support prefix distribution across samples with proper weighting
Reviewed Changes
Copilot reviewed 8 out of 11 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
tests/unit/dataset/test_synthetic.py | Removed old synthetic dataset tests (entire file deleted) |
tests/unit/data/deserializers/test_synthetic.py | Added comprehensive test suite for new synthetic dataset implementation |
src/guidellm/data/utils.py | Added prefix column mapping support |
src/guidellm/data/objects.py | Added prefix_column to data model types |
src/guidellm/data/formatters/templates.py | Updated Jinja templates to handle prefix columns in request formatting |
src/guidellm/data/deserializers/synthetic.py | Implemented new prefix bucket configuration and generation logic |
src/guidellm/data/deserializers/init.py | Added export for new SyntheticTextPrefixBucketConfig class |
pyproject.toml | Added torch and librosa dependencies |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
@model_validator(mode="after") | ||
def check_prefix_options(self) -> Self: | ||
prefix_count = self.__pydantic_extra__.get("prefix_count", None) # type: ignore[attr-defined] | ||
prefix_tokens = self.__pydantic_extra__.get("prefix_count", None) # type: ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second call to __pydantic_extra__.get()
should retrieve 'prefix_tokens', not 'prefix_count'. This will always return None for prefix_tokens, breaking the backward compatibility feature.
prefix_tokens = self.__pydantic_extra__.get("prefix_count", None) # type: ignore[attr-defined] | |
prefix_tokens = self.__pydantic_extra__.get("prefix_tokens", None) # type: ignore[attr-defined] |
Copilot uses AI. Check for mistakes.
if text_column and text_column|length == 1 | ||
else text_column | ||
) | ||
"prompt": prefix_column[0]|default("") + text_column[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Direct string concatenation without proper spacing could result in malformed prompts. Consider adding a space or newline separator between prefix and text content to ensure proper formatting.
"prompt": prefix_column[0]|default("") + text_column[0] | |
"prompt": prefix_column[0]|default("") ~ " " ~ text_column[0] |
Copilot uses AI. Check for mistakes.
|
||
# Increase weights to ensure an integer number of samples per per-prefix | ||
least_common_prefix_count = math.lcm( | ||
*(bucket.prefix_count for bucket in self.config.prefix_buckets) | ||
) | ||
unnorm_weights = [ | ||
least_common_prefix_count * bucket.bucket_weight // bucket.prefix_count | ||
for bucket in self.config.prefix_buckets | ||
] | ||
# Use GCD to reduce the weights to smallest integer ratio | ||
common_divisor = math.gcd(*unnorm_weights) | ||
|
||
# Create prefix list maintaining the correct distribution | ||
prefixes = [] | ||
for bucket, weight in zip(self.config.prefix_buckets, unnorm_weights): | ||
bucket_prefixes = [ | ||
self._create_prompt(bucket.prefix_tokens, faker) | ||
for _ in range(bucket.prefix_count) | ||
] | ||
sample_count = weight // common_divisor | ||
prefixes.extend(bucket_prefixes * sample_count) | ||
|
||
while True: | ||
yield rand.choice(prefixes) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function returns early when prefix_buckets
is None or empty, but the remaining code is not properly indented as an else block. This creates unreachable code and potential runtime errors.
# Increase weights to ensure an integer number of samples per per-prefix | |
least_common_prefix_count = math.lcm( | |
*(bucket.prefix_count for bucket in self.config.prefix_buckets) | |
) | |
unnorm_weights = [ | |
least_common_prefix_count * bucket.bucket_weight // bucket.prefix_count | |
for bucket in self.config.prefix_buckets | |
] | |
# Use GCD to reduce the weights to smallest integer ratio | |
common_divisor = math.gcd(*unnorm_weights) | |
# Create prefix list maintaining the correct distribution | |
prefixes = [] | |
for bucket, weight in zip(self.config.prefix_buckets, unnorm_weights): | |
bucket_prefixes = [ | |
self._create_prompt(bucket.prefix_tokens, faker) | |
for _ in range(bucket.prefix_count) | |
] | |
sample_count = weight // common_divisor | |
prefixes.extend(bucket_prefixes * sample_count) | |
while True: | |
yield rand.choice(prefixes) | |
else: | |
# Increase weights to ensure an integer number of samples per per-prefix | |
least_common_prefix_count = math.lcm( | |
*(bucket.prefix_count for bucket in self.config.prefix_buckets) | |
) | |
unnorm_weights = [ | |
least_common_prefix_count * bucket.bucket_weight // bucket.prefix_count | |
for bucket in self.config.prefix_buckets | |
] | |
# Use GCD to reduce the weights to smallest integer ratio | |
common_divisor = math.gcd(*unnorm_weights) | |
# Create prefix list maintaining the correct distribution | |
prefixes = [] | |
for bucket, weight in zip(self.config.prefix_buckets, unnorm_weights): | |
bucket_prefixes = [ | |
self._create_prompt(bucket.prefix_tokens, faker) | |
for _ in range(bucket.prefix_count) | |
] | |
sample_count = weight // common_divisor | |
prefixes.extend(bucket_prefixes * sample_count) | |
while True: | |
yield rand.choice(prefixes) |
Copilot uses AI. Check for mistakes.
[[tool.pdm.source]] | ||
name = "torch" | ||
type = "find_links" | ||
#url = "https://download.pytorch.org/whl/cpu/torch_stable.html" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented-out URL should be removed rather than left as a comment. If this is for documentation purposes, consider adding a proper comment explaining why this specific URL is used.
#url = "https://download.pytorch.org/whl/cpu/torch_stable.html" | |
# Previous URL for torch wheels (replaced with the current one due to updated structure) |
Copilot uses AI. Check for mistakes.
*(bucket.prefix_count for bucket in self.config.prefix_buckets) | ||
) | ||
unnorm_weights = [ | ||
least_common_prefix_count * bucket.bucket_weight // bucket.prefix_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This calculation could result in integer division by zero if bucket.prefix_count
is 0, despite the field validation. Consider adding a runtime check or ensuring the validation prevents this case.
Copilot uses AI. Check for mistakes.
dd7a4b8
into
features/refactor/multimodal_support
TODO
CSV arg string supportCSV arg string now supports single bucket (see last example). Might leave it at that for now.Summary
This PR is a port of #287 to the v0.4.0 refactor branch.
Adds controls for sharing one or more fixed prefixes between samples. See examples bellow.
Details
Adds a
prefix_buckets
argument to theSyntheticTextDatasetConfig
, each bucket consists of a prefix count, token count, and bucket weight. Prefix count sets the number of unique prefixes to generate for a given bucket, token count is the length of each prompt in the bucket, and bucket weight is used to calculate the proportion of requests the bucket applies to relative to the sum of all bucket weights. Here are a few examples:Here we have one bucket of 32 prefixes of length 2048. Since there are 1024 total samples each prefix will apply to 32 samples. If there is only one bucket than weight can be omitted as the bucket applies to 100% of samples.
In this modified version of the first example 16 of the prompts have 2048 tokens while the other 16 have 1024 tokens.
The prefix tokens of a bucket can also be 0 to disable prefixes for those samples. Here is an example where 40% of the samples have a prefix of 2048 tokens while the other 60% have no prefix.
If only a single bucket is needed, it can be set at the top level. This make the changes backwards compatible with the previous interface and allows the CSV string format to work without parsing nested structures (at least for this use-case).
Test Plan
pytest tests/unit/dataset
)Related Issues
Use of AI
## WRITTEN BY AI ##
)