-
Notifications
You must be signed in to change notification settings - Fork 294
NO-JIRA: job-run-aggregator: remove relaxing in disruption aggregation #4899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
NO-JIRA: job-run-aggregator: remove relaxing in disruption aggregation #4899
Conversation
Our current aggregation logic is too sensitive, leading to the rejection of payloads for non-regressions. Analysis shows that a significant portion of rejected payloads are failing due to infrastructure noise or existing flakes rather than genuine code regressions. We have component readiness as a backstop to identify regressions with greater sample sizes.
I do not _entirely_ understand why the relaxation for disruption tests exists here but the TODO exists this may be worth tightening. The `pityFactor` will relax the threshold in some cases (high attempts, high pass rate) so the additional relaxation would only apply in the remaining cases.
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@petr-muller: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThe changes introduce a "pity factor" mechanism that relaxes pass requirements for small sample sizes in job run aggregation analysis. A new helper function computes adjusted pass thresholds allowing up to 2 failures for smaller samples, reducing false failures from infrastructure noise. This affects percentile disruption checks and failure determination logic across multiple functions. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes ✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro Cache: Disabled due to data retention organization setting Knowledge base: Disabled due to 📒 Files selected for processing (2)
🧰 Additional context used📓 Path-based instructions (1)**⚙️ CodeRabbit configuration file
Files:
🧬 Code graph analysis (1)pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (9)
✏️ Tip: You can disable this entire section by setting Comment |
|
/hold Needs #4894 |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: petr-muller The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/uncc @danilo-gemoli @droslean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request removes the simple -1 relaxation for disruption tests and replaces it with a more sophisticated pityFactor function that allows up to 2 failures regardless of the strict pass requirement. The change affects both disruption percentile checks and regular test failure checks.
Changes:
- Replaced the
-1adjustment with apityFactorfunction that allows up to 2 failures - Updated test cases to reflect new expected pass requirements (7 instead of 6, 5 instead of 4)
- Added comprehensive test coverage for the pity factor behavior
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail.go | Removed -1 relaxation, added pityFactor function, integrated pity factor into disruption and regular checks, updated summary messages |
| pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go | Updated existing test expectations, added comprehensive tests for CheckFailedWithPityFactor and innerCheckPercentileDisruptionWithPityFactor |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/test all |
|
@petr-muller: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
I do not entirely understand why the relaxation for disruption tests exists here but the TODO exists this may be worth tightening. The
pityFactorwill relax the threshold in some cases (high attempts, high pass rate) so the additional relaxation would only apply in the remaining cases.