feat: CID-70 - Operator install preflight checks#123
Open
oskarwojciski wants to merge 4 commits intomainfrom
Open
feat: CID-70 - Operator install preflight checks#123oskarwojciski wants to merge 4 commits intomainfrom
oskarwojciski wants to merge 4 commits intomainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a preflight check that validates the installation environment before the operator is installed.
Preflight Checks
The preflight install check validates:
castai-agentnamespacecastware-operatorrelease name (temporary rule, until phase2 onboarding script won't have hardcoded release name)castai-agentchartThe check runs as a Helm pre-install hook and will prevent installation if any validation fails, providing clear error messages to guide troubleshooting.
Related Changes
Cleanup Job Service Account
The cleanup job now uses a dedicated service account (
castware-operator-cleanup) instead of reusing the controller manager's service account. This is necessary due to Kubernetes lifecycle constraints:castware-operator-controller-managerservice account doesn't exist yetE2E Tes
ginkgo.flake-attempts=1flag to allow one automatic retry for flaky specsCurrent Behavior
When the preflight install check fails, users running
helm installwill only see a generic error message:Error: release castware-operator failed, and has been uninstalled due to atomic being set: failed pre-install: 1 error occurred: * job castware-operator-preflight-install-check failed: BackoffLimitExceededThis message does not show the detailed preflight check logs that contain helpful diagnostic information about why the installation failed (e.g., API connectivity issues, wrong namespace, helm repository access problems).
Recommended Solution
To improve user experience, consider providing an installation script that automatically displays job logs on failure:
Benefits of Installation Script
- Namespace validation failures
- API connectivity issues with possible causes
- Helm repository access problems
Alternative Workaround
Until an installation script is provided, users experiencing installation failures can manually retrieve the logs:
kubectl logs -n $(kubectl get pods -A -l job-name=castware-operator-preflight-install-check -o jsonpath='{.items[0].metadata.namespace}') -l job-name=castware-operator-preflight-install-check --tail=-1