Skip to content

Conversation

grokspawn
Copy link
Contributor

Description of the change:
opm validate fails when a skip edge exists for a channel which matches the entry's replaces edge.

Motivation for the change:
Due to OLMv0 graph mechanics, the skips edge will cause OLMv0 to ignore the bundle version when considering upgrades (since v0 discards graph contribution from skipped bundle versions).
Since the purpose of a replaces edge is to enable upgrade mobility across a graph, allowing the bundle version to be ignored (due to the skips entry) is an error, and potentially results in stranding.

For example, take input olm.channel:

{
    "schema": "olm.channel",
    "name": "stable-v1",
    "package": "test-operator",
    "entries": [
        {
            "name": "test-operator-v1.0.0"
        },
        {
            "name": "test-operator-v1.1.0"
        },
        {
            "name": "test-operator-v1.1.2"
        },
        {
            "name": "test-operator-v1.1.4",
            "replaces": "test-operator-v1.0.0",
            "skips": [
                "test-operator-v1.0.0",
                "test-operator-v1.1.0",
                "test-operator-v1.1.2"
            ]
        },
        {
            "name": "test-operator-v1.2.0"
        },
        {
            "name": "test-operator-v1.2.1",
            "replaces": "test-operator-v1.1.4",
            "skips": [
                "test-operator-v1.1.4",
                "test-operator-v1.2.0"
            ]
        },
        {
            "name": "test-operator-v1.3.0",
            "replaces": "test-operator-v1.2.1",
            "skips": [
                "test-operator-v1.2.1"
            ]
        },
        {
            "name": "test-operator-v1.4.0",
            "replaces": "test-operator-v1.3.0",
            "skips": [
                "test-operator-v1.3.0"
            ]
        }
    ]
}

Using a new version of opm which can optionally display OLMv0 graph semantics, you can appreciate that the edges with duplicate replaces/skips will be ignored in the graph (skipped objects are limned in red and ignored edges are red dashed arrows).
mermaid-diagram-2025-08-19-102936

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Docs updated or added to /docs
  • Commit messages sensible and descriptive

Copy link

codecov bot commented Aug 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 55.27%. Comparing base (bf8476b) to head (5bd337c).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1750   +/-   ##
=======================================
  Coverage   55.26%   55.27%           
=======================================
  Files         136      136           
  Lines       15974    15976    +2     
=======================================
+ Hits         8828     8830    +2     
  Misses       5991     5991           
  Partials     1155     1155           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@grokspawn
Copy link
Contributor Author

/approve

Copy link
Contributor

openshift-ci bot commented Aug 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: grokspawn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 20, 2025
if slices.Contains(entry.Skips, entry.Replaces) {
return nil, fmt.Errorf("invalid package %q, channel %q: entry %q has identical replaces and skips: %q", c.Package, c.Name, entry.Name, entry.Replaces)
}
}
Copy link
Contributor

@camilamacedo86 camilamacedo86 Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Make sense for me my only concern is:
Did we check how many cases do we have that fail in this scenario?
we might need to create a script to validate, what we do if we have FBC catalogs with?

But maybe it will need to see outside of this PR

/lgtm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do see one instance in the operatorhubio catalog:

./operatorhubio/latest
FATA[0002] invalid package "grafana-operator", channel "v5": entry "grafana-operator.v5.10.0" has identical replaces and skips: "grafana-operator.v5.9.2"

let's
/hold
this until we can talk to some impacted folks and determine if this is a big enough problem to have to solve NOW.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 20, 2025
@joelanford
Copy link
Member

This validation check seems to be very narrowly tailored to "can't both skip and replace the same thing in one entry", which is good!

However, I think it very slightly misses the point and the broader problem.

  1. It is actually okay to both skip and replace a bundle that is already a leaf node in the graph.
  2. When a node is skip-ed and causes other entries to no longer have a path to the channel head, that is the real problem that we need to check for.

@grokspawn
Copy link
Contributor Author

This validation check seems to be very narrowly tailored to "can't both skip and replace the same thing in one entry", which is good!

However, I think it very slightly misses the point and the broader problem.

1. It is actually okay to both `skip` and `replace` a bundle that is already a leaf node in the graph.

This is totally fine in any OLMv1 context, but I'd argue that since it comes with migration side-effects for OLMv0 that it's never OK. In general, we should not have these kind of surprises, and I think it's reasonable to enforce the most-restrictive case here (because it's easier to grow-permissive than -restrictive).

2. When a node is `skip`-ed and causes other entries to no longer have a path to the channel head, _that_ is the real problem that we need to check for.

That's a specific flavor of this more general issue. But I'd argue that it is also resolved by preventing the more general issue.

@@ -402,7 +402,7 @@ var validFS = fstest.MapFS{
"name": "clusterwide-alpha",
"entries": [
{"name": "etcdoperator.v0.9.0"},
{"name": "etcdoperator.v0.9.2-clusterwide", "replaces": "etcdoperator.v0.9.0", "skips": ["etcdoperator.v0.6.1","etcdoperator.v0.9.0"], "skipRange": ">=0.9.0 <=0.9.1"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to the model validation change somehow? It seems unrelated to me at first glance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes a skips from the slice where it duplicates the replaces edge.
It was needed for the previous commit, and I haven't yet checked to see if the existing catalogs impact is different with the new commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the catalogs impact and it looks the same as before. HOWEVER, we no longer need this test change, because somehow it's OK for 0.9.2-clusterwide to skip AND replace v0.9.0 ...?

@grokspawn grokspawn force-pushed the channel-edge-no-dupe-skip-replace branch from 9acb503 to 7607718 Compare August 28, 2025 20:41
@bandrade
Copy link
Contributor

bandrade commented Sep 3, 2025

I've tested this PR using a synthetic catalog with the following structure to explicitly trigger the replaces + skips conflict detection introduced by this PR.


📦 Catalog Structure

All files were placed under a directory named minimal-test, and used .yaml extension to ensure opm validate would parse them.

test.package.yaml

schema: olm.package
name: test-operator
defaultChannel: stable-v1

stable-v1.channel.yaml

schema: olm.channel
name: stable-v1
package: test-operator
entries:
  - name: test-operator-v1.0.0
  - name: test-operator-v1.1.0
  - name: test-operator-v1.1.2
  - name: test-operator-v1.1.4
    replaces: test-operator-v1.0.0
    skips:
      - test-operator-v1.0.0
      - test-operator-v1.1.0
      - test-operator-v1.1.2
  - name: test-operator-v1.2.0
  - name: test-operator-v1.2.1
    replaces: test-operator-v1.1.4
    skips:
      - test-operator-v1.1.4     # <- conflict with replaces
      - test-operator-v1.2.0
  - name: test-operator-v1.3.0
    replaces: test-operator-v1.2.1
    skips:
      - test-operator-v1.2.1     # <- conflict with replaces
  - name: test-operator-v1.4.0
    replaces: test-operator-v1.3.0
    skips:
      - test-operator-v1.3.0     # <- conflict with replaces

test-operator-.bundle.yaml (for each version)

Each file includes the required olm.package property, e.g.:

schema: olm.bundle
name: test-operator-v1.2.1
package: test-operator
image: quay.io/example/test-operator:v1.2.1
properties:
  - type: olm.package
    value:
      packageName: test-operator
      version: 1.2.1

opm validate completed without any errors.

I was expecting results like this

invalid package "test-operator", channel "stable-v1": entry "test-operator-v1.2.1" has identical replaces and skips: "test-operator-v1.1.4"
invalid package "test-operator", channel "stable-v1": entry "test-operator-v1.3.0" has identical replaces and skips: "test-operator-v1.2.1"
invalid package "test-operator", channel "stable-v1": entry "test-operator-v1.4.0" has identical replaces and skips: "test-operator-v1.3.0"

Could you validate if is there something wrong on my test? Thanks

Signed-off-by: grokspawn <jordan@nimblewidget.com>
@grokspawn grokspawn force-pushed the channel-edge-no-dupe-skip-replace branch from 7607718 to 5bd337c Compare September 5, 2025 14:04
@grokspawn
Copy link
Contributor Author

I've tested this PR using a synthetic catalog with the following structure to explicitly trigger the replaces + skips conflict detection introduced by this PR.

Could you validate if is there something wrong on my test? Thanks

Hey @bandrade I'll need to update the PR description, because the new commit changed the functionality to not merely refuse skipped-replaces, but to really consider if a skipped-replace strands bundles across the replaces chain.

The original example essentially ignores ALL lower bundle versions, so the new check does not identify it as a failure.
In order for it to be identified, there have to be non-skipped bundles earlier in the replaces chain which are stranded because intermediary links are ignored (belong to a skipped edge).

For e.g., this modification to your channel.yaml results in a failure:

schema: olm.channel
name: stable-v1
package: test-operator
entries:
  - name: test-operator-v1.0.0
  - name: test-operator-v1.1.0
    replaces: test-operator-v1.0.0
  - name: test-operator-v1.1.2
    replaces: test-operator-v1.1.0
  - name: test-operator-v1.1.4
    replaces: test-operator-v1.1.2
    skips:
      - test-operator-v1.1.2
  - name: test-operator-v1.2.0
  - name: test-operator-v1.2.1
    replaces: test-operator-v1.1.4
    skips:
      - test-operator-v1.1.4     # <- conflict with replaces
      - test-operator-v1.2.0
  - name: test-operator-v1.3.0
    replaces: test-operator-v1.2.1
    skips:
      - test-operator-v1.2.1     # <- conflict with replaces
  - name: test-operator-v1.4.0
    replaces: test-operator-v1.3.0
    skips:
      - test-operator-v1.3.0     # <- conflict with replaces

results in the message

FATA[0000] invalid index:
└── invalid package "test-operator":
    └── invalid channel "stable-v1":
        └── channel contains one or more stranded bundles: test-operator-v1.0.0, test-operator-v1.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants