Skip to content

Fix KeyError crashes and division by zero in LLM inference benchmark#339

Open
Prachi194agrawal wants to merge 1 commit intokubeedge:mainfrom
Prachi194agrawal:fix/llm-inference-response-format
Open

Fix KeyError crashes and division by zero in LLM inference benchmark#339
Prachi194agrawal wants to merge 1 commit intokubeedge:mainfrom
Prachi194agrawal:fix/llm-inference-response-format

Conversation

@Prachi194agrawal
Copy link

Problem

The cloud-edge-collaborative-inference-for-llm benchmark crashes with KeyError and ZeroDivisionError during execution, preventing successful completion of benchmarking runs.

Solution

This PR makes the response parsing and metric calculations more robust:

  1. result_parser.py: Replace direct dictionary access with .get() methods and default values
  2. throughput.py: Add zero-division check before calculating throughput

Changes Made

1. testenv/result_parser.py

  • Changed from response["key"] to response.get("key", default)
  • Added nested .get() for "usage" and "perf" dictionaries
  • Provides sensible defaults (empty strings for text, 0 for numeric values)
  • Ensures graceful handling of incomplete response structures

2. testenv/throughput.py

  • Added check: if average_itl == 0: return 0.0
  • Prevents ZeroDivisionError when internal_token_latency is 0
  • Returns 0.0 throughput when metrics are unavailable

Testing

✅ Benchmark now runs successfully without crashes
✅ Handles incomplete response structures gracefully
✅ Metrics compute correctly even with zero values
✅ No regression in functionality
Screenshot from 2026-02-10 21-36-38
Screenshot from 2026-02-10 21-36-30

@kubeedge-bot kubeedge-bot requested a review from hsj576 February 10, 2026 16:30
@kubeedge-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Prachi194agrawal
To complete the pull request process, please assign moorezheng after the PR has been reviewed.
You can assign the PR to them by writing /assign @moorezheng in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot
Copy link
Collaborator

Welcome @Prachi194agrawal! It looks like this is your first PR to kubeedge/ianvs 🎉

@kubeedge-bot kubeedge-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 10, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello @Prachi194agrawal, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses critical stability issues in the LLM inference benchmark by making its response parsing and metric calculation logic more resilient. It prevents common runtime errors like KeyError and ZeroDivisionError, which previously caused the benchmark to crash, thereby ensuring more reliable and complete benchmarking runs.

Highlights

  • Robust Response Parsing: Replaced direct dictionary access with .get() methods and default values in result_parser.py to prevent KeyError crashes when parsing LLM inference responses.
  • Zero-Division Prevention: Implemented a zero-division check in throughput.py to prevent ZeroDivisionError when calculating throughput, ensuring the benchmark completes successfully even with zero internal token latency.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • examples/cloud-edge-collaborative-inference-for-llm/testenv/result_parser.py
    • Replaced direct dictionary key access (e.g., response["key"]) with safe .get() calls.
    • Introduced default values (empty strings for text, 0 for numeric) for potentially missing keys.
    • Implemented nested .get() calls for 'usage' and 'perf' dictionaries to handle incomplete structures gracefully.
  • examples/cloud-edge-collaborative-inference-for-llm/testenv/throughput.py
    • Added a conditional check (if average_itl == 0: return 0.0) before calculating average_throughput.
    • Ensured that throughput returns 0.0 when internal_token_latency is zero, preventing division by zero.
Activity
  • The pull request was created by Prachi194agrawal to fix KeyError and ZeroDivisionError in the LLM inference benchmark.
  • The author provided a detailed problem description, solution, and testing results, including screenshots, demonstrating the fixes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses KeyError and ZeroDivisionError crashes in the LLM inference benchmark by making response parsing more robust and adding a zero-division check. The changes are well-targeted and correct. I've added two suggestions: one for improving code readability in result_parser.py, and a critical one for throughput.py to handle an edge case where the input list is empty, which would still cause a ZeroDivisionError. I noticed that similar logic for calculating averages exists in other metric files (internal_token_latency.py, time_to_first_token.py) which are not part of this PR's changes; they could also benefit from a check for empty inputs to prevent potential crashes.

Comment on lines +40 to +41
if average_itl == 0:
return 0.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This check correctly handles the case where average_itl is zero. However, the code is still vulnerable to a ZeroDivisionError on line 37 if y_pred is an empty list, as len(infer_res) would be zero. This would still crash the benchmark. To make this function fully robust, you should add a check for an empty infer_res list before the division.

For example, you could add this after line 35:

if not infer_res:
    return 0.0

Comment on lines +48 to +53
response.get("usage", {}).get("prompt_tokens", 0),
response.get("usage", {}).get("completion_tokens", 0),
response.get("usage", {}).get("total_tokens", 0),
response.get("perf", {}).get("time_to_first_token", 0),
response.get("perf", {}).get("internal_token_latency", 0),
response.get("perf", {}).get("throughput", 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve readability and avoid repetitive calls like response.get('usage', {}), you could extract the nested dictionaries into local variables. This would make the constructor call cleaner.

For example:

        if response:
            usage = response.get("usage", {})
            perf = response.get("perf", {})
            return cls(
                response.get("completion", ""),
                usage.get("prompt_tokens", 0),
                usage.get("completion_tokens", 0),
                usage.get("total_tokens", 0),
                perf.get("time_to_first_token", 0),
                perf.get("internal_token_latency", 0),
                perf.get("throughput", 0)
            )

@Prachi194agrawal Prachi194agrawal force-pushed the fix/llm-inference-response-format branch from 0510ff0 to 8177ef3 Compare February 10, 2026 16:44
@kubeedge-bot kubeedge-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 10, 2026
… inference metrics

- Make Response.from_dict() robust with .get() methods and default values
- Extract usage and perf dicts to avoid nested KeyError
- Add empty list check in throughput calculation
- Prevent ZeroDivisionError when internal_token_latency is 0
- Fixes runtime crashes in cloud-edge-collaborative-inference-for-llm benchmark

Signed-off-by: Prachi194agrawal <agrawalprachi7718@gmail.com>
@Prachi194agrawal Prachi194agrawal force-pushed the fix/llm-inference-response-format branch from 8177ef3 to c92593d Compare February 10, 2026 17:38
@kubeedge-bot kubeedge-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants