-
Notifications
You must be signed in to change notification settings - Fork 86
Enhance Claude failure analysis with Velero source and must-gather feedback #2051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: oadp-dev
Are you sure you want to change the base?
Conversation
…edback - Clone openshift/velero (oadp-dev branch) in ci-Dockerfile for source code investigation during failure analysis - Add Velero source code investigation prompts to analyze_failures.sh, enabling Claude to trace errors back to Velero implementation - Add must-gather improvement suggestions section to analysis output, creating a feedback loop for improving diagnostics collection - Add data mover volume restore limitation to error ignore patterns (claim Selector not supported per vmware-tanzu/velero#7946) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Review skipped — only excluded labels are configured. (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughAdds a Velero repo checkout to the CI Dockerfile, extends an E2E error-ignore pattern for a Data Mover restore message, and enriches the failure-analysis script with Velero/OADP source-code references and step-by-step investigation guidance. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances the Claude-based failure analysis system by integrating Velero source code access and adding a feedback mechanism for must-gather improvements. The changes enable deeper root cause analysis by allowing Claude to investigate Velero implementation details when analyzing test failures, and create a feedback loop for improving diagnostic data collection.
Key Changes:
- Added Velero source code cloning in the CI Docker image for runtime investigation during failure analysis
- Extended failure analysis prompts to guide Claude through Velero source code investigation when errors originate from Velero packages
- Added must-gather improvement suggestions section to capture gaps in diagnostic data collection
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| build/ci-Dockerfile | Clones openshift/velero (oadp-dev branch) to provide source code access for failure analysis |
| tests/e2e/scripts/analyze_failures.sh | Adds Velero source investigation prompts and must-gather feedback section to guide Claude's analysis workflow |
| tests/e2e/lib/flakes.go | Adds known data mover limitation to error ignore patterns based on upstream Velero issue |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 5. **Correlation**: Group related errors together - if multiple errors reference the same resource (backup name, PVC, pod), keep them together with their context. | ||
| 6. **Source references**: When you find errors from Velero packages (pkg/backup/, pkg/restore/, pkg/controller/, pkg/nodeagent/), note the file:line references for later source code investigation. |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space after 'pkg/nodeagent/' before the closing parenthesis for consistency with other package references.
| 6. **Source references**: When you find errors from Velero packages (pkg/backup/, pkg/restore/, pkg/controller/, pkg/nodeagent/), note the file:line references for later source code investigation. | |
| 6. **Source references**: When you find errors from Velero packages (pkg/backup/, pkg/restore/, pkg/controller/, pkg/nodeagent/ ), note the file:line references for later source code investigation. |
| Note: Prow's build-log.txt is NOT available during this analysis (it's written after tests complete). | ||
| Focus on JUnit results, preprocessed log summaries, must-gather diagnostics, and per-test pod logs. | ||
| Focus on JUnit results, preprocessed log summaries, must-gather diagnostics, per-test pod logs, and Velero source investigation. |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line references 'Velero source investigation' but this capability depends on the Docker image containing the cloned source code. Consider adding a note about this dependency or verifying the source code is available before attempting investigation.
Enable Claude to investigate OADP operator source at /go/src/github.com/openshift/oadp-operator/ during failure analysis: - Add OADP operator source to Available Artifacts section - Rename "Velero Source Code Investigation" to "Source Code Investigation" with subsections for both Velero and OADP packages - Update Claude invocation prompt to reference OADP source - List key OADP packages: internal/controller/, pkg/velero/, pkg/credentials/, api/v1alpha1/, tests/e2e/lib/ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/retest |
|
/retest |
|
/retest |
weshayutin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/LGTM
@mpryc WDYT?
|
@kaovilai: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
|
||
| # Clone openshift/velero source code for failure analysis | ||
| # Uses oadp-dev branch to match OADP operator development | ||
| RUN git clone --depth 1 --branch oadp-dev \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the Velero clone fails (network issue, branch rename), the Docker build will continue but Claude's analysis will reference non-existent files. Consider adding error handling:
RUN git clone --depth 1 --branch oadp-dev \
https://github.com/openshift/velero.git \
/go/src/github.com/openshift/velero || \
echo "Warning: Velero source clone failed, source investigation will be unavailable"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The build log for this container is not seen by Claude so I'm not sure echo here does anything.
|
|
||
| # Clone openshift/velero source code for failure analysis | ||
| # Uses oadp-dev branch to match OADP operator development | ||
| RUN git clone --depth 1 --branch oadp-dev \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded oadp-dev branch works for current development, but may need updating if Velero's branch naming changes or if release branches need different Velero versions. A future enhancement could make this configurable via ARG VELERO_BRANCH=oadp-dev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it so you can specify a branch from Makefile right? So when this is cherry picked only change the Makefile right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure works
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kaovilai, shubham-pampattiwar, weshayutin The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
investigation during failure analysis
enabling Claude to trace errors back to Velero implementation
creating a feedback loop for improving diagnostics collection
(claim Selector not supported per failed to restore volume with StorageClass, claim Selector is not supported vmware-tanzu/velero#7946)
🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com
Why the changes were made
How to test the changes made