Skip to content

Conversation

@kaovilai
Copy link
Member

  • Clone openshift/velero (oadp-dev branch) in ci-Dockerfile for source code
    investigation during failure analysis
  • Add Velero source code investigation prompts to analyze_failures.sh,
    enabling Claude to trace errors back to Velero implementation
  • Add must-gather improvement suggestions section to analysis output,
    creating a feedback loop for improving diagnostics collection
  • Add data mover volume restore limitation to error ignore patterns
    (claim Selector not supported per failed to restore volume with StorageClass, claim Selector is not supported vmware-tanzu/velero#7946)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com

Why the changes were made

How to test the changes made

…edback

- Clone openshift/velero (oadp-dev branch) in ci-Dockerfile for source code
  investigation during failure analysis
- Add Velero source code investigation prompts to analyze_failures.sh,
  enabling Claude to trace errors back to Velero implementation
- Add must-gather improvement suggestions section to analysis output,
  creating a feedback loop for improving diagnostics collection
- Add data mover volume restore limitation to error ignore patterns
  (claim Selector not supported per vmware-tanzu/velero#7946)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings December 18, 2025 16:19
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 18, 2025

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Review skipped — only excluded labels are configured. (1)
  • do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds a Velero repo checkout to the CI Dockerfile, extends an E2E error-ignore pattern for a Data Mover restore message, and enriches the failure-analysis script with Velero/OADP source-code references and step-by-step investigation guidance.

Changes

Cohort / File(s) Summary
CI/Build Infrastructure
build/ci-Dockerfile
Inserts a git clone of github.com/openshift/velero (branch oadp-dev, depth 1) into /go/src/github.com/openshift/velero prior to running go mod download.
Test Failure Patterns
tests/e2e/lib/flakes.go
Adds an error-ignore pattern: "failed to restore volume with StorageClass, claim Selector is not supported".
Failure Analysis Script
tests/e2e/scripts/analyze_failures.sh
Adds Velero and OADP source-code references, a "Velero Source Code Investigation" section, expanded analysis prompts (including Claude preprocessing), updated artifact listings, and must-gather improvement suggestions—documentation and guidance only.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Review focus areas:
    • build/ci-Dockerfile: verify clone step ordering, network/security implications, and path correctness.
    • tests/e2e/scripts/analyze_failures.sh: confirm prompts and references are accurate and do not alter runtime behavior.
    • tests/e2e/lib/flakes.go: ensure the added ignore pattern is precise and intended.

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 18, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the Claude-based failure analysis system by integrating Velero source code access and adding a feedback mechanism for must-gather improvements. The changes enable deeper root cause analysis by allowing Claude to investigate Velero implementation details when analyzing test failures, and create a feedback loop for improving diagnostic data collection.

Key Changes:

  • Added Velero source code cloning in the CI Docker image for runtime investigation during failure analysis
  • Extended failure analysis prompts to guide Claude through Velero source code investigation when errors originate from Velero packages
  • Added must-gather improvement suggestions section to capture gaps in diagnostic data collection

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
build/ci-Dockerfile Clones openshift/velero (oadp-dev branch) to provide source code access for failure analysis
tests/e2e/scripts/analyze_failures.sh Adds Velero source investigation prompts and must-gather feedback section to guide Claude's analysis workflow
tests/e2e/lib/flakes.go Adds known data mover limitation to error ignore patterns based on upstream Velero issue

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

5. **Correlation**: Group related errors together - if multiple errors reference the same resource (backup name, PVC, pod), keep them together with their context.
6. **Source references**: When you find errors from Velero packages (pkg/backup/, pkg/restore/, pkg/controller/, pkg/nodeagent/), note the file:line references for later source code investigation.
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after 'pkg/nodeagent/' before the closing parenthesis for consistency with other package references.

Suggested change
6. **Source references**: When you find errors from Velero packages (pkg/backup/, pkg/restore/, pkg/controller/, pkg/nodeagent/), note the file:line references for later source code investigation.
6. **Source references**: When you find errors from Velero packages (pkg/backup/, pkg/restore/, pkg/controller/, pkg/nodeagent/ ), note the file:line references for later source code investigation.

Copilot uses AI. Check for mistakes.
Note: Prow's build-log.txt is NOT available during this analysis (it's written after tests complete).
Focus on JUnit results, preprocessed log summaries, must-gather diagnostics, and per-test pod logs.
Focus on JUnit results, preprocessed log summaries, must-gather diagnostics, per-test pod logs, and Velero source investigation.
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line references 'Velero source investigation' but this capability depends on the Docker image containing the cloned source code. Consider adding a note about this dependency or verifying the source code is available before attempting investigation.

Copilot uses AI. Check for mistakes.
Enable Claude to investigate OADP operator source at
/go/src/github.com/openshift/oadp-operator/ during failure analysis:

- Add OADP operator source to Available Artifacts section
- Rename "Velero Source Code Investigation" to "Source Code Investigation"
  with subsections for both Velero and OADP packages
- Update Claude invocation prompt to reference OADP source
- List key OADP packages: internal/controller/, pkg/velero/,
  pkg/credentials/, api/v1alpha1/, tests/e2e/lib/

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@weshayutin
Copy link
Contributor

/retest

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 5, 2026
@weshayutin
Copy link
Contributor

/retest

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 8, 2026
@weshayutin
Copy link
Contributor

/retest

Copy link
Contributor

@weshayutin weshayutin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM
@mpryc WDYT?

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 12, 2026
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD f6abbb4 and 2 for PR HEAD b8828b2 in total

@openshift-ci
Copy link

openshift-ci bot commented Jan 14, 2026

@kaovilai: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.20-e2e-test-cli-aws b8828b2 link false /test 4.20-e2e-test-cli-aws

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD f509f50 and 1 for PR HEAD b8828b2 in total


# Clone openshift/velero source code for failure analysis
# Uses oadp-dev branch to match OADP operator development
RUN git clone --depth 1 --branch oadp-dev \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Velero clone fails (network issue, branch rename), the Docker build will continue but Claude's analysis will reference non-existent files. Consider adding error handling:

  RUN git clone --depth 1 --branch oadp-dev \
      https://github.com/openshift/velero.git \
      /go/src/github.com/openshift/velero || \
      echo "Warning: Velero source clone failed, source investigation will be unavailable"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build log for this container is not seen by Claude so I'm not sure echo here does anything.


# Clone openshift/velero source code for failure analysis
# Uses oadp-dev branch to match OADP operator development
RUN git clone --depth 1 --branch oadp-dev \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded oadp-dev branch works for current development, but may need updating if Velero's branch naming changes or if release branches need different Velero versions. A future enhancement could make this configurable via ARG VELERO_BRANCH=oadp-dev.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it so you can specify a branch from Makefile right? So when this is cherry picked only change the Makefile right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure works

@openshift-ci
Copy link

openshift-ci bot commented Jan 17, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaovilai, shubham-pampattiwar, weshayutin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [kaovilai,shubham-pampattiwar]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants