-
Notifications
You must be signed in to change notification settings - Fork 246
OCPBUGS-33013: certsyncpod+installerpod: Swap secret/cm directories atomically #2009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b3eca57 to
a389cbd
Compare
8dec327 to
60f05a8
Compare
|
@tchap: This pull request references Jira Issue OCPBUGS-33013, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@tchap: This pull request references Jira Issue OCPBUGS-33013, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@tchap: This pull request references Jira Issue OCPBUGS-33013, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
I actually have to make sure this can be merged as this is only supported on Linux 3.15 or later. /hold |
|
This patch should be OK for RHEL 8 or later based on https://access.redhat.com/articles/3078 The latest CI for OCP 4.21 actually uses RHEL 9.6. |
60f05a8 to
f6df27a
Compare
|
The PR using this change in cluster-kube-apiserver-operator seems to be passing on CI, I deem this ready. /unhold |
|
@tchap is there a must-gather from an incident i could take a look at ? |
|
@p0lyn0mial I think that I consulted this one: https://access.redhat.com/support/cases/#/case/03849958/discussion?attachmentId=a096R00003JpGgMQAV |
|
@vrutkovs do you have time to take a look at this issue ? I think that the issue might be real. I think the issue is when a two file cert is replaced. It can happen that the server picks up the update and notices the public/private key mismatch and crashes. Is there a way to repo this issue ? |
| filePerms := os.FileMode(0600) | ||
| if strings.HasSuffix(fullFilename, ".sh") { | ||
| filePerms = 0755 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, didn't notice this check for custom permission setting...
6b52148 to
36195b3
Compare
f390998 to
c11264b
Compare
| strings.HasSuffix(path, "/staging/cert-sync/secrets") || | ||
| strings.HasSuffix(path, "/staging/cert-sync/configmaps") || | ||
| path == filepath.Join(controller.destinationDir, "configmaps") || | ||
| path == filepath.Join(controller.destinationDir, "secrets") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not too pretty, but meh.
deedcd7 to
542e674
Compare
542e674 to
ba412ca
Compare
p0lyn0mial
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a few more comments. overall lgtm.
please also test this pr with some operator e.g. kas-o
439e1f3 to
46d0eac
Compare
Use atomicdir.Sync to write target secret/configmap directories to be synchronized with the relevant objects. Added unit tests, but the coverage is not complete. Particularly filesystem operations failing are not being tested.
46d0eac to
f2df141
Compare
|
lgtm |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: p0lyn0mial, tchap, vrutkovs The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/unhold let's merge this. |
|
@tchap: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@tchap: Jira Issue OCPBUGS-33013: All pull requests linked via external trackers have merged:
Jira Issue OCPBUGS-33013 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Fix included in accepted release 4.21.0-0.nightly-2025-11-13-042845 |
Currently it can happen that cert-syncer replaces some of the secret/configmap files successfully and then fails. This can lead to problems when these are e.g. TLS cert/key files and the directory gets inconsistent. This may seem transient, but when cert-syncer is terminated in the middle, it can later fail to start as the whole kube-apiserver gets into a crash loop.
This introduces a new
staticpod.SwapDirectoriesAtomic, which usesunix.Renameat2withRENAME_EXCHANGEflag set.