Skip to content

Conversation

@leigao97
Copy link

@leigao97 leigao97 commented Nov 5, 2025

Problem:
The fixed schedule strategy was incorrectly creating a schedule entry for every turn in multi-turn conversations. This caused more requests to be sent than there were actual conversations in the dataset.

Example:

  • Dataset with 3 trace entries forming 2 conversations (conversation A: single turn, conversation B: 2 turns)
  • Expected: 3 requests
  • Actual: 5 requests (one for conversation A, 2+2 for conversation B)

Root Cause:
In DatasetManager._handle_dataset_timing_request(), the code was iterating through all turns in each conversation and adding each turn to the schedule:

for conversation_id, conversation in self.dataset.items():
for turn in conversation.turns:
timing_dataset.append((turn.timestamp, conversation_id))

This meant multi-turn conversations were scheduled multiple times.

Solution:
Schedule each conversation only once using the first turn's timestamp. The worker is already designed to handle sending all turns in a conversation sequentially after retrieving it.

for conversation_id, conversation in self.dataset.items():
if conversation.turns:
timing_dataset.append((conversation.turns[0].timestamp, conversation_id))

Summary by CodeRabbit

  • Refactor
    • Optimized dataset timing collection logic to consolidate timing entries per conversation, reducing redundant data while improving efficiency.

…ersations

Problem:
The fixed schedule strategy was incorrectly creating a schedule entry for
every turn in multi-turn conversations. This caused more requests to be
sent than there were actual conversations in the dataset.

Example:
- Dataset with 3 trace entries forming 2 conversations
  (conversation A: single turn, conversation B: 2 turns)
- Expected: 2 requests (one per conversation)
- Actual: 3 requests (one for each turn, treating turns as separate requests)

Root Cause:
In DatasetManager._handle_dataset_timing_request(), the code was iterating
through all turns in each conversation and adding each turn to the schedule:

  for conversation_id, conversation in self.dataset.items():
      for turn in conversation.turns:
          timing_dataset.append((turn.timestamp, conversation_id))

This meant multi-turn conversations were scheduled multiple times.

Solution:
Schedule each conversation only once using the first turn's timestamp.
The worker is already designed to handle sending all turns in a conversation
sequentially after retrieving it.

  for conversation_id, conversation in self.dataset.items():
      if conversation.turns:
          timing_dataset.append((conversation.turns[0].timestamp, conversation_id))

Impact:
- Fixed schedule now correctly sends one request per conversation
- Multi-turn conversations properly execute all turns in sequence
- Request count matches the number of unique conversations in the dataset
@github-actions
Copy link

github-actions bot commented Nov 5, 2025

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@fix-multi-turn-duplicate-requests

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@fix-multi-turn-duplicate-requests

@leigao97 leigao97 changed the title Fix: Prevent duplicate requests in fixed schedule for multi-turn conversations fix: prevent duplicate requests in fixed schedule for multi-turn conversations Nov 5, 2025
@github-actions github-actions bot added the fix label Nov 5, 2025
@codecov
Copy link

codecov bot commented Nov 5, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@coderabbitai
Copy link

coderabbitai bot commented Nov 5, 2025

Walkthrough

The dataset timing collection logic was refactored to aggregate timing data per conversation using only the first turn's timestamp (if available), rather than iterating over all turns. This reduces timing_data entries to one per conversation and eliminates entries for conversations with no turns.

Changes

Cohort / File(s) Summary
Timing data aggregation logic
src/aiperf/dataset/dataset_manager.py
Changed timestamp collection from per-turn iteration to single first-turn selection, reducing entries per conversation to one and avoiding empty conversation entries.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify the logic correctly selects the first turn's timestamp and handles cases where conversations have no turns
  • Confirm the change maintains intended timing semantics (why first turn vs. alternatives?)
  • Check for any edge cases around empty or single-turn conversations

Poem

🐰 A hop through timing's tales so grand,
First turn's timestamp takes the stand,
One per chat, no orphaned dates,
The logic aggregates and waits!
Conversations shine so bright,
When timing gets it right. ✨

Pre-merge checks

✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: fixing duplicate requests in the fixed schedule for multi-turn conversations by changing timing data collection from per-turn to per-conversation.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ajcasagrande
Copy link
Contributor

ajcasagrande commented Nov 5, 2025

Hi @leigao97 , thanks for the fix!

It looks like you need to run the pre-commit hooks for formatting. Can use make format as well I believe.

Would you mind adding a unit test for this?

@leigao97
Copy link
Author

leigao97 commented Nov 6, 2025

Hi @ajcasagrande, I created a test case for multi-turn request generation. Thank you!

Copy link
Contributor

@debermudez debermudez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this catch @leigao97!

Copy link
Contributor

@ajcasagrande ajcasagrande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the fix, and the tests!

@saturley-hall
Copy link
Member

Hi @leigao97, thank you so much for your contribution! Your are our first!

You hit us at an inflection point in our CI tooling. Just (2 hours) before your first commit that we set our contribution guidelines to follow the Developer Certificate of Origin. To make sure we enforce that, we now have a required workflow of the DCO GitHub app to make sure that each commit that we take has used the --signoff argument or has a manually added signoff trailer.

The latest failure of this bot's action suggests a fix of git rebase HEAD~7 --signoff then a force push (git push --force-with-lease origin fix-multi-turn-duplicate-requests). Since the only other person that has put a commit on your branch is @ajcasagrande (a NVIDIA employee) we are fine with you signing off on his commits which merge main in.

If you have any questions or concerns please let us know and we will try to assist.

Thank you again for your PR!

@leigao97
Copy link
Author

@saturley-hall Thank you for this note. I will fix the commit. However, I noticed that this PR's implementation may drop the timestamp of the second/third/etc. turns in the conversation, as @ajcasagrande commented. I will double-check the timestamp of each turn.

@ajcasagrande
Copy link
Contributor

@leigao97 Right now we only support delay-based approach for additional turn. I'm not too sure how specifying exact timestamps for additional turns would make sense, as what if the response hasn't even come back yet? And you can't convert timestamps to delays, as delays are relative to response latency.

So i think the only expected solution would be to reject the dataset if it contains those. Thoughts @debermudez

@leigao97
Copy link
Author

leigao97 commented Nov 10, 2025

Your point makes sense to me. How about we allow users to append turns to the history context themselves in the dataset? For example:

 '{"session_id": "session-1", "input_length": 100, "output_length": 50, "timestamp": 1000}',
 '{"session_id": "session-1", "input_length": 210 (=100+50+60), "output_length": 40, "timestamp": 2000}',

However, this approach implicitly assumes each request is independent and cannot potentially reuse the prefix cache.

@debermudez
Copy link
Contributor

@leigao97 Right now we only support delay-based approach for additional turn. I'm not too sure how specifying exact timestamps for additional turns would make sense, as what if the response hasn't even come back yet? And you can't convert timestamps to delays, as delays are relative to response latency.

So i think the only expected solution would be to reject the dataset if it contains those. Thoughts @debermudez

I agree with this.

I am still thinking thru @leigao97 's suggestion.

@ajcasagrande
Copy link
Contributor

ajcasagrande commented Nov 11, 2025

@leigao97 It should look like this:

{"session_id": "abc", "timestamp": 1000, "input_length": 100, "output_length": 50}
{"session_id": "abc", "delay": 1000, "input_length": 60, "output_length": 40}

The second one will automatically append the context of the first request, as per how the code currently works, and 60 is the "new" content length (actual ISL should be computed against all data)

I think there may also be room for improving the --custom-dataset-type multi_turn format to support more mooncake-like features (right now you have to provide the actual text):

{
  "session_id": "abc",
  "turns": [
    {"timestamp": 1000, "text": "Turn 1"},
    {"delay": 5000, "text": "Turn 2"}
  ]
}

For example, if we supported more mooncake-like

{
  "session_id": "abc",
  "turns": [
    { "timestamp": 1000, "input_length": 100, "output_length": 50, "hash_ids": [1, 2, 3]},
    {"delay": 1000, "input_length": 60, "output_length": 40, "hash_ids": [4, 5]}
  ]
}

(ps: we moved the tests to tests/unit)

@leigao97
Copy link
Author

@ajcasagrande Thanks for the explanations. I noticed that multi-turn doesn't support ISL, so I looked into the mooncake dataset. The proposed mooncake-like multi-turn also makes sense, and it is probably the closest approach to replay a realistic dataset (multi-turn with each request associated with a timestamp).

The only difference would be that the replay delays would be longer than the actual delays, as the replay delays are relative to response latency, while the actual delays include the response latency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants