Fix KeyError crashes and division by zero in LLM inference benchmark#339
Fix KeyError crashes and division by zero in LLM inference benchmark#339Prachi194agrawal wants to merge 1 commit intokubeedge:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Prachi194agrawal The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @Prachi194agrawal! It looks like this is your first PR to kubeedge/ianvs 🎉 |
Summary of ChangesHello @Prachi194agrawal, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses critical stability issues in the LLM inference benchmark by making its response parsing and metric calculation logic more resilient. It prevents common runtime errors like Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively addresses KeyError and ZeroDivisionError crashes in the LLM inference benchmark by making response parsing more robust and adding a zero-division check. The changes are well-targeted and correct. I've added two suggestions: one for improving code readability in result_parser.py, and a critical one for throughput.py to handle an edge case where the input list is empty, which would still cause a ZeroDivisionError. I noticed that similar logic for calculating averages exists in other metric files (internal_token_latency.py, time_to_first_token.py) which are not part of this PR's changes; they could also benefit from a check for empty inputs to prevent potential crashes.
| if average_itl == 0: | ||
| return 0.0 |
There was a problem hiding this comment.
This check correctly handles the case where average_itl is zero. However, the code is still vulnerable to a ZeroDivisionError on line 37 if y_pred is an empty list, as len(infer_res) would be zero. This would still crash the benchmark. To make this function fully robust, you should add a check for an empty infer_res list before the division.
For example, you could add this after line 35:
if not infer_res:
return 0.0| response.get("usage", {}).get("prompt_tokens", 0), | ||
| response.get("usage", {}).get("completion_tokens", 0), | ||
| response.get("usage", {}).get("total_tokens", 0), | ||
| response.get("perf", {}).get("time_to_first_token", 0), | ||
| response.get("perf", {}).get("internal_token_latency", 0), | ||
| response.get("perf", {}).get("throughput", 0) |
There was a problem hiding this comment.
To improve readability and avoid repetitive calls like response.get('usage', {}), you could extract the nested dictionaries into local variables. This would make the constructor call cleaner.
For example:
if response:
usage = response.get("usage", {})
perf = response.get("perf", {})
return cls(
response.get("completion", ""),
usage.get("prompt_tokens", 0),
usage.get("completion_tokens", 0),
usage.get("total_tokens", 0),
perf.get("time_to_first_token", 0),
perf.get("internal_token_latency", 0),
perf.get("throughput", 0)
)0510ff0 to
8177ef3
Compare
… inference metrics - Make Response.from_dict() robust with .get() methods and default values - Extract usage and perf dicts to avoid nested KeyError - Add empty list check in throughput calculation - Prevent ZeroDivisionError when internal_token_latency is 0 - Fixes runtime crashes in cloud-edge-collaborative-inference-for-llm benchmark Signed-off-by: Prachi194agrawal <agrawalprachi7718@gmail.com>
8177ef3 to
c92593d
Compare
Problem
The cloud-edge-collaborative-inference-for-llm benchmark crashes with KeyError and ZeroDivisionError during execution, preventing successful completion of benchmarking runs.
Solution
This PR makes the response parsing and metric calculations more robust:
.get()methods and default valuesChanges Made
1.
testenv/result_parser.pyresponse["key"]toresponse.get("key", default).get()for "usage" and "perf" dictionaries2.
testenv/throughput.pyif average_itl == 0: return 0.0Testing
✅ Benchmark now runs successfully without crashes


✅ Handles incomplete response structures gracefully
✅ Metrics compute correctly even with zero values
✅ No regression in functionality