-
Notifications
You must be signed in to change notification settings - Fork 7
fix: prevent duplicate requests in fixed schedule for multi-turn conversations #444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: prevent duplicate requests in fixed schedule for multi-turn conversations #444
Conversation
…ersations
Problem:
The fixed schedule strategy was incorrectly creating a schedule entry for
every turn in multi-turn conversations. This caused more requests to be
sent than there were actual conversations in the dataset.
Example:
- Dataset with 3 trace entries forming 2 conversations
(conversation A: single turn, conversation B: 2 turns)
- Expected: 2 requests (one per conversation)
- Actual: 3 requests (one for each turn, treating turns as separate requests)
Root Cause:
In DatasetManager._handle_dataset_timing_request(), the code was iterating
through all turns in each conversation and adding each turn to the schedule:
for conversation_id, conversation in self.dataset.items():
for turn in conversation.turns:
timing_dataset.append((turn.timestamp, conversation_id))
This meant multi-turn conversations were scheduled multiple times.
Solution:
Schedule each conversation only once using the first turn's timestamp.
The worker is already designed to handle sending all turns in a conversation
sequentially after retrieving it.
for conversation_id, conversation in self.dataset.items():
if conversation.turns:
timing_dataset.append((conversation.turns[0].timestamp, conversation_id))
Impact:
- Fixed schedule now correctly sends one request per conversation
- Multi-turn conversations properly execute all turns in sequence
- Request count matches the number of unique conversations in the dataset
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@fix-multi-turn-duplicate-requestsRecommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@fix-multi-turn-duplicate-requests |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
WalkthroughThe dataset timing collection logic was refactored to aggregate timing data per conversation using only the first turn's timestamp (if available), rather than iterating over all turns. This reduces timing_data entries to one per conversation and eliminates entries for conversations with no turns. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @leigao97 , thanks for the fix! It looks like you need to run the pre-commit hooks for formatting. Can use Would you mind adding a unit test for this? |
|
Hi @ajcasagrande, I created a test case for multi-turn request generation. Thank you! |
debermudez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this catch @leigao97!
ajcasagrande
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for the fix, and the tests!
|
Hi @leigao97, thank you so much for your contribution! Your are our first! You hit us at an inflection point in our CI tooling. Just (2 hours) before your first commit that we set our contribution guidelines to follow the Developer Certificate of Origin. To make sure we enforce that, we now have a required workflow of the DCO GitHub app to make sure that each commit that we take has used the The latest failure of this bot's action suggests a fix of If you have any questions or concerns please let us know and we will try to assist. Thank you again for your PR! |
|
@saturley-hall Thank you for this note. I will fix the commit. However, I noticed that this PR's implementation may drop the timestamp of the second/third/etc. turns in the conversation, as @ajcasagrande commented. I will double-check the timestamp of each turn. |
|
@leigao97 Right now we only support delay-based approach for additional turn. I'm not too sure how specifying exact timestamps for additional turns would make sense, as what if the response hasn't even come back yet? And you can't convert timestamps to delays, as delays are relative to response latency. So i think the only expected solution would be to reject the dataset if it contains those. Thoughts @debermudez |
|
Your point makes sense to me. How about we allow users to append turns to the history context themselves in the dataset? For example: However, this approach implicitly assumes each request is independent and cannot potentially reuse the prefix cache. |
I agree with this. I am still thinking thru @leigao97 's suggestion. |
|
@leigao97 It should look like this: {"session_id": "abc", "timestamp": 1000, "input_length": 100, "output_length": 50}
{"session_id": "abc", "delay": 1000, "input_length": 60, "output_length": 40}The second one will automatically append the context of the first request, as per how the code currently works, and 60 is the "new" content length (actual ISL should be computed against all data) I think there may also be room for improving the {
"session_id": "abc",
"turns": [
{"timestamp": 1000, "text": "Turn 1"},
{"delay": 5000, "text": "Turn 2"}
]
}For example, if we supported more mooncake-like {
"session_id": "abc",
"turns": [
{ "timestamp": 1000, "input_length": 100, "output_length": 50, "hash_ids": [1, 2, 3]},
{"delay": 1000, "input_length": 60, "output_length": 40, "hash_ids": [4, 5]}
]
}(ps: we moved the tests to |
|
@ajcasagrande Thanks for the explanations. I noticed that multi-turn doesn't support ISL, so I looked into the mooncake dataset. The proposed mooncake-like multi-turn also makes sense, and it is probably the closest approach to replay a realistic dataset (multi-turn with each request associated with a timestamp). The only difference would be that the replay delays would be longer than the actual delays, as the replay delays are relative to response latency, while the actual delays include the response latency. |
Problem:
The fixed schedule strategy was incorrectly creating a schedule entry for every turn in multi-turn conversations. This caused more requests to be sent than there were actual conversations in the dataset.
Example:
Root Cause:
In DatasetManager._handle_dataset_timing_request(), the code was iterating through all turns in each conversation and adding each turn to the schedule:
for conversation_id, conversation in self.dataset.items():
for turn in conversation.turns:
timing_dataset.append((turn.timestamp, conversation_id))
This meant multi-turn conversations were scheduled multiple times.
Solution:
Schedule each conversation only once using the first turn's timestamp. The worker is already designed to handle sending all turns in a conversation sequentially after retrieving it.
for conversation_id, conversation in self.dataset.items():
if conversation.turns:
timing_dataset.append((conversation.turns[0].timestamp, conversation_id))
Summary by CodeRabbit