feat: add resource migration job and troubleshooting section for upgrade failures#289
feat: add resource migration job and troubleshooting section for upgrade failures#289
Conversation
…ade failures Introduced a new resource migration job to handle OpenSearch StatefulSets during upgrades, along with a troubleshooting section in the documentation addressing potential upgrade failures due to pre-deploy migration hooks. This includes detailed descriptions, stack traces, solutions, and recommendations for successful upgrades.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7bd3c87855
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if [ "$NODE_MASTER" -gt 0 ]; then | ||
| echo "[resource-migrator] StatefulSet $STS_NAME contains 'node.master' env (OpenSearch 1.x)" | ||
| echo "[resource-migrator] Deleting StatefulSet $STS_NAME with --cascade=orphan" | ||
| if $KUBECTL -n "$NS" delete statefulset "$STS_NAME" --cascade=orphan --ignore-not-found=true; then |
There was a problem hiding this comment.
Avoid orphaning StatefulSets before remaining hooks run
This hook deletes each matching StatefulSet (--cascade=orphan) during pre-upgrade, but another pre-upgrade hook (migration-job.yaml, weight 10) can still fail with BackoffLimitExceeded (as documented in docs/public/troubleshooting.md), which aborts the upgrade before the chart reapplies StatefulSets. In the upgrade path where node.master is present and the later migration hook fails, the cluster is left running orphaned pods without StatefulSet controllers until a subsequent successful upgrade, which is a high-risk operational state.
Useful? React with 👍 / 👎.
…arch upgrades Introduced a new section detailing the resource migration job that automatically handles the removal of incompatible OpenSearch 1.x StatefulSets during upgrades to 2.x. This section explains the job's functionality, parameters, and how it integrates with ArgoCD to ensure a smooth upgrade process without manual intervention.
Introduced a new resource migration job to handle OpenSearch StatefulSets during upgrades, along with a troubleshooting section in the documentation addressing potential upgrade failures due to pre-deploy migration hooks. This includes detailed descriptions, stack traces, solutions, and recommendations for successful upgrades.
What type of PR is this? (check all applicable)
Description
TDB
Related Tickets & Documents
QA Instructions, Screenshots, Recordings
Please replace this line with instructions on how to test your changes, a note
on the devices and browsers this has been tested on, as well as any relevant
images for UI changes.
Breaking Change checklist
If your PR includes any deployment or processing changes, please utilize this checklist:
Added/updated tests?
have not been included
[optional] Are there any things to highlight or double check?
[optional] What gif best describes this PR or how it makes you feel?