fix: prevent duplicate requests in fixed schedule for multi-turn conversations #444

leigao97 · 2025-11-05T03:40:23Z

Problem:
The fixed schedule strategy was incorrectly creating a schedule entry for every turn in multi-turn conversations. This caused more requests to be sent than there were actual conversations in the dataset.

Example:

Dataset with 3 trace entries forming 2 conversations (conversation A: single turn, conversation B: 2 turns)
Expected: 3 requests
Actual: 5 requests (one for conversation A, 2+2 for conversation B)

Root Cause:
In DatasetManager._handle_dataset_timing_request(), the code was iterating through all turns in each conversation and adding each turn to the schedule:

for conversation_id, conversation in self.dataset.items():
for turn in conversation.turns:
timing_dataset.append((turn.timestamp, conversation_id))

This meant multi-turn conversations were scheduled multiple times.

Solution:
Schedule each conversation only once using the first turn's timestamp. The worker is already designed to handle sending all turns in a conversation sequentially after retrieving it.

for conversation_id, conversation in self.dataset.items():
if conversation.turns:
timing_dataset.append((conversation.turns[0].timestamp, conversation_id))

Summary by CodeRabbit

Refactor
- Optimized dataset timing collection logic to consolidate timing entries per conversation, reducing redundant data while improving efficiency.

…ersations Problem: The fixed schedule strategy was incorrectly creating a schedule entry for every turn in multi-turn conversations. This caused more requests to be sent than there were actual conversations in the dataset. Example: - Dataset with 3 trace entries forming 2 conversations (conversation A: single turn, conversation B: 2 turns) - Expected: 2 requests (one per conversation) - Actual: 3 requests (one for each turn, treating turns as separate requests) Root Cause: In DatasetManager._handle_dataset_timing_request(), the code was iterating through all turns in each conversation and adding each turn to the schedule: for conversation_id, conversation in self.dataset.items(): for turn in conversation.turns: timing_dataset.append((turn.timestamp, conversation_id)) This meant multi-turn conversations were scheduled multiple times. Solution: Schedule each conversation only once using the first turn's timestamp. The worker is already designed to handle sending all turns in a conversation sequentially after retrieving it. for conversation_id, conversation in self.dataset.items(): if conversation.turns: timing_dataset.append((conversation.turns[0].timestamp, conversation_id)) Impact: - Fixed schedule now correctly sends one request per conversation - Multi-turn conversations properly execute all turns in sequence - Request count matches the number of unique conversations in the dataset

github-actions · 2025-11-05T03:40:36Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@fix-multi-turn-duplicate-requests

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@fix-multi-turn-duplicate-requests

codecov · 2025-11-05T03:42:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

coderabbitai · 2025-11-05T03:43:18Z

Walkthrough

The dataset timing collection logic was refactored to aggregate timing data per conversation using only the first turn's timestamp (if available), rather than iterating over all turns. This reduces timing_data entries to one per conversation and eliminates entries for conversations with no turns.

Changes

Cohort / File(s)	Summary
Timing data aggregation logic `src/aiperf/dataset/dataset_manager.py`	Changed timestamp collection from per-turn iteration to single first-turn selection, reducing entries per conversation to one and avoiding empty conversation entries.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Verify the logic correctly selects the first turn's timestamp and handles cases where conversations have no turns
Confirm the change maintains intended timing semantics (why first turn vs. alternatives?)
Check for any edge cases around empty or single-turn conversations

Poem

🐰 A hop through timing's tales so grand,
First turn's timestamp takes the stand,
One per chat, no orphaned dates,
The logic aggregates and waits!
Conversations shine so bright,
When timing gets it right. ✨

Pre-merge checks

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: fixing duplicate requests in the fixed schedule for multi-turn conversations by changing timing data collection from per-turn to per-conversation.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ajcasagrande · 2025-11-05T04:10:01Z

Hi @leigao97 , thanks for the fix!

It looks like you need to run the pre-commit hooks for formatting. Can use make format as well I believe.

Would you mind adding a unit test for this?

leigao97 · 2025-11-06T04:04:47Z

Hi @ajcasagrande, I created a test case for multi-turn request generation. Thank you!

debermudez

Thanks for this catch @leigao97!

ajcasagrande

thank you for the fix, and the tests!

src/aiperf/dataset/dataset_manager.py

saturley-hall · 2025-11-07T01:38:00Z

Hi @leigao97, thank you so much for your contribution! Your are our first!

You hit us at an inflection point in our CI tooling. Just (2 hours) before your first commit that we set our contribution guidelines to follow the Developer Certificate of Origin. To make sure we enforce that, we now have a required workflow of the DCO GitHub app to make sure that each commit that we take has used the --signoff argument or has a manually added signoff trailer.

The latest failure of this bot's action suggests a fix of git rebase HEAD~7 --signoff then a force push (git push --force-with-lease origin fix-multi-turn-duplicate-requests). Since the only other person that has put a commit on your branch is @ajcasagrande (a NVIDIA employee) we are fine with you signing off on his commits which merge main in.

If you have any questions or concerns please let us know and we will try to assist.

Thank you again for your PR!

leigao97 · 2025-11-10T18:47:26Z

@saturley-hall Thank you for this note. I will fix the commit. However, I noticed that this PR's implementation may drop the timestamp of the second/third/etc. turns in the conversation, as @ajcasagrande commented. I will double-check the timestamp of each turn.

ajcasagrande · 2025-11-10T22:40:01Z

@leigao97 Right now we only support delay-based approach for additional turn. I'm not too sure how specifying exact timestamps for additional turns would make sense, as what if the response hasn't even come back yet? And you can't convert timestamps to delays, as delays are relative to response latency.

So i think the only expected solution would be to reject the dataset if it contains those. Thoughts @debermudez

leigao97 · 2025-11-10T23:14:38Z

Your point makes sense to me. How about we allow users to append turns to the history context themselves in the dataset? For example:

 '{"session_id": "session-1", "input_length": 100, "output_length": 50, "timestamp": 1000}',
 '{"session_id": "session-1", "input_length": 210 (=100+50+60), "output_length": 40, "timestamp": 2000}',

However, this approach implicitly assumes each request is independent and cannot potentially reuse the prefix cache.

debermudez · 2025-11-10T23:22:44Z

@leigao97 Right now we only support delay-based approach for additional turn. I'm not too sure how specifying exact timestamps for additional turns would make sense, as what if the response hasn't even come back yet? And you can't convert timestamps to delays, as delays are relative to response latency.

So i think the only expected solution would be to reject the dataset if it contains those. Thoughts @debermudez

I agree with this.

I am still thinking thru @leigao97 's suggestion.

ajcasagrande · 2025-11-11T00:18:13Z

@leigao97 It should look like this:

{"session_id": "abc", "timestamp": 1000, "input_length": 100, "output_length": 50}
{"session_id": "abc", "delay": 1000, "input_length": 60, "output_length": 40}

The second one will automatically append the context of the first request, as per how the code currently works, and 60 is the "new" content length (actual ISL should be computed against all data)

I think there may also be room for improving the --custom-dataset-type multi_turn format to support more mooncake-like features (right now you have to provide the actual text):

{
  "session_id": "abc",
  "turns": [
    {"timestamp": 1000, "text": "Turn 1"},
    {"delay": 5000, "text": "Turn 2"}
  ]
}

For example, if we supported more mooncake-like

{
  "session_id": "abc",
  "turns": [
    { "timestamp": 1000, "input_length": 100, "output_length": 50, "hash_ids": [1, 2, 3]},
    {"delay": 1000, "input_length": 60, "output_length": 40, "hash_ids": [4, 5]}
  ]
}

(ps: we moved the tests to tests/unit)

leigao97 · 2025-11-11T00:30:02Z

@ajcasagrande Thanks for the explanations. I noticed that multi-turn doesn't support ISL, so I looked into the mooncake dataset. The proposed mooncake-like multi-turn also makes sense, and it is probably the closest approach to replay a realistic dataset (multi-turn with each request associated with a timestamp).

The only difference would be that the replay delays would be longer than the actual delays, as the replay delays are relative to response latency, while the actual delays include the response latency.

leigao97 changed the title ~~Fix: Prevent duplicate requests in fixed schedule for multi-turn conversations~~ fix: prevent duplicate requests in fixed schedule for multi-turn conversations Nov 5, 2025

github-actions bot added the fix label Nov 5, 2025

t-leigao_microsoft and others added 3 commits November 6, 2025 02:02

Add test case for multi-turn timing request fix

d884319

Add test for empty conversation edge case to improve coverage

6bf1ef0

Merge branch 'main' into fix-multi-turn-duplicate-requests

0edca48

Merge branch 'main' into fix-multi-turn-duplicate-requests

7913802

ganeshku1 requested review from ajcasagrande and debermudez November 6, 2025 16:24

debermudez approved these changes Nov 6, 2025

View reviewed changes

Merge branch 'main' into fix-multi-turn-duplicate-requests

c98492e

ajcasagrande approved these changes Nov 7, 2025

View reviewed changes

src/aiperf/dataset/dataset_manager.py Show resolved Hide resolved

Merge branch 'main' into fix-multi-turn-duplicate-requests

5d81cdc

fix: prevent duplicate requests in fixed schedule for multi-turn conversations #444

Are you sure you want to change the base?

fix: prevent duplicate requests in fixed schedule for multi-turn conversations #444

Uh oh!

Conversation

leigao97 commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Nov 5, 2025

Try out this PR

Uh oh!

codecov bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

ajcasagrande commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leigao97 commented Nov 6, 2025

Uh oh!

debermudez left a comment

Choose a reason for hiding this comment

Uh oh!

ajcasagrande left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

saturley-hall commented Nov 7, 2025

Uh oh!

leigao97 commented Nov 10, 2025

Uh oh!

ajcasagrande commented Nov 10, 2025

Uh oh!

leigao97 commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

debermudez commented Nov 10, 2025

Uh oh!

ajcasagrande commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leigao97 commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

leigao97 commented Nov 5, 2025 •

edited

Loading

codecov bot commented Nov 5, 2025 •

edited

Loading

coderabbitai bot commented Nov 5, 2025 •

edited

Loading

ajcasagrande commented Nov 5, 2025 •

edited

Loading

leigao97 commented Nov 10, 2025 •

edited

Loading

ajcasagrande commented Nov 11, 2025 •

edited

Loading