-
Notifications
You must be signed in to change notification settings - Fork 65
test: Istio e2e test suite #570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: notebooks-v2
Are you sure you want to change the base?
test: Istio e2e test suite #570
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
7837c35
to
0a79c95
Compare
- Add InstallIstioctl() function to download and install istioctl binary - Add InstallIstioMinimalWithIngress() to set up Istio with minimal profile - Add IsIstioInstalled() and WaitIstioctlAvailable() helper functions - Use positional formatting in URL template for istioctl downloads - Support configurable Istio namespace for installation - Add error handling and proper command output redirection This enables e2e tests to automatically set up Istio service mesh components required for workspace HTTP proxy functionality. Signed-off-by: Yash Pal <yashpal2104@gmail.com>
0a79c95
to
95f18da
Compare
…gement - Add InstallIstioctl() with OS/arch detection for Linux, macOS, Windows - Add comprehensive Istio gateway functions (install, check, wait) - Enhance WaitIstioAvailable() to verify all Istio components - Add UninstallIstioctl() for complete cleanup - Use platform-specific download methods with fallbacks Signed-off-by: Yash Pal <yashpal2104@gmail.com>
…tests - Add istioctl binary download and installation for multiple platforms - Implement Istio minimal profile deployment with ingress gateway - Add comprehensive Istio availability checks and wait functions - Improve error handling and cleanup for test components - Add platform-specific binary management for Windows/Linux/macOS Signed-off-by: Yash Pal <yashpal2104@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start - lets get these issues explained and/or resolved first and I'll continue with review afterwards.
Send me any error logs you may have w.r.t the problems you were having with CertManager (once you undo those changes in this PR) and I can help debug.
|
||
if !skipIstioctlInstall { | ||
By("checking if istioctl is installed already") | ||
isIstioctlAlreadyInstalled = utils.IsIstioInstalled() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to be really careful with naming here...
checking if istioctl
is "installed" (i.e. binary available) is much different than checking if istio
is installed (which presumably would be an istioctl --version
check)
I could/would imagine each of those use cases requiring their own helper function...
osName := runtime.GOOS | ||
var rmCmd *exec.Cmd | ||
|
||
if osName == "windows" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are already numerous places in our Makefile
where running on Windows would not work ...
As such - I see no need (for now) to add the complexity of supporting windows here.. since it would presumably fail before this code could/would ever get invoked. So we should simplify this function to remove all Windows support (and anywhere else outside this function for that matter 😎 )
|
||
// use LTS version of istioctl | ||
istioctlVersion = "1.27.0" | ||
istioctlURL = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect this variable to have an actual value?
Maybe call it istioctlURLTemplate
and give it the template value defined here?
return cmd.Run() | ||
} | ||
|
||
// TODO: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO:
relevant/valid?
} | ||
|
||
// InstallIstioMinimalWithIngress installs Istio with minimal profile and ingressgateway enabled. | ||
func InstallIstioMinimalWithIngress(namespace string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything calling this function.. which would explain the error I am getting trying to run the code:
[FAILED] in [BeforeSuite] - /Users/astonebe/Development/Code/GitHub/kubeflow-notebooks/workspaces/controller/test/e2e/e2e_suite_test.go:106 @ 09/12/25 14:56:46.656
[BeforeSuite] [FAILED] [3.213 seconds]
[BeforeSuite]
/Users/astonebe/Development/Code/GitHub/kubeflow-notebooks/workspaces/controller/test/e2e/e2e_suite_test.go:58
[FAILED] istioctl is not available
Expected success, but got an error:
<*fmt.wrapError | 0x14000238c80>:
istio-system namespace not found: kubectl get namespace istio-system failed with error: (exit status 1) Error from server (NotFound): namespaces "istio-system" not found
{
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we should remove this function as nothing is calling it.
|
||
// First check if cert-manager namespace exists, if not install cert-manager | ||
// cmd := exec.Command("kubectl", "get", "namespace", "cert-manager") | ||
// if _, err := Run(cmd); err != nil { | ||
// // Namespace doesn't exist, install cert-manager | ||
// fmt.Println("cert-manager namespace not found, installing cert-manager...") | ||
// if err := InstallCertManager(); err != nil { | ||
// return fmt.Errorf("failed to install cert-manager: %w", err) | ||
// } | ||
// } | ||
|
||
// // Wait for the cert-manager namespace to be ready | ||
// cmd = exec.Command("kubectl", "wait", "--for=condition=Ready", "namespace/cert-manager", "--timeout=300s") | ||
// if _, err := Run(cmd); err != nil { | ||
// return fmt.Errorf("cert-manager namespace not ready: %w", err) | ||
// } | ||
|
||
// // Wait for each CertManager deployment individually by name (most reliable) | ||
// deployments := []string{"cert-manager", "cert-manager-cainjector", "cert-manager-webhook"} | ||
|
||
// for _, deployment := range deployments { | ||
// cmd := exec.Command("kubectl", "wait", "deployment", deployment, | ||
// "-n", "cert-manager", | ||
// "--for", "condition=Available", | ||
// "--timeout", "300s") | ||
|
||
// if _, err := Run(cmd); err != nil { | ||
// return fmt.Errorf("deployment %s not ready: %w", deployment, err) | ||
// } | ||
// } | ||
|
||
// // Wait for the cert-manager webhook to be ready (critical for functionality) | ||
// cmd = exec.Command("kubectl", "wait", "pods", | ||
// "-n", "cert-manager", | ||
// "-l", "app=webhook", | ||
// "--for", "condition=Ready", | ||
// "--timeout", "300s") | ||
|
||
// if _, err := Run(cmd); err != nil { | ||
// return fmt.Errorf("cert-manager webhook pods not ready: %w", err) | ||
// } | ||
|
||
// return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely do NOT want to comment this code out.
workspaces/controller/Makefile
Outdated
@echo "Installing CertManager..." | ||
@if [ "$(CERT_MANAGER_INSTALL_SKIP)" != "true" ]; then \ | ||
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.18.2/cert-manager.yaml; \ | ||
echo "Waiting for CertManager deployments to be ready..."; \ | ||
kubectl wait --for=condition=Available --timeout=300s deployment/cert-manager -n cert-manager; \ | ||
kubectl wait --for=condition=Available --timeout=300s deployment/cert-manager-cainjector -n cert-manager; \ | ||
kubectl wait --for=condition=Available --timeout=300s deployment/cert-manager-webhook -n cert-manager; \ | ||
echo "CertManager is ready!"; \ | ||
fi | ||
@echo "Installing Istio..." | ||
@if [ "$(ISTIO_INSTALL_SKIP)" != "true" ]; then \ | ||
echo "Istio installation will be handled by the test suite"; \ | ||
fi | ||
@if [ "$(PROMETHEUS_INSTALL_SKIP)" != "true" ]; then \ | ||
echo "Installing Prometheus (if needed)..."; \ | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove this code and rely on logic within go
testing infrastructure to perform this setup.
- Add InstallIstio() function with auto-detection - Modify WaitIstioAvailable() to install if istio-system namespace missing - Improve Istio/istioctl installation checks - Fix cert-manager "Stdout already set" error Resolves "istio-system namespace not found" test failures. Signed-off-by: Yash Pal <yashpal2104@gmail.com>
6275070
to
633c52a
Compare
Signed-off-by: Yash Pal <yashpal2104@gmail.com>
Signed-off-by: Yash Pal <yashpal2104@gmail.com>
07f1529
to
a7ded8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice progress - I am now able to observe the test-e2e
command completing successfully 💯
Ran 1 of 1 Specs in 132.105 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 0 Skipped
However, I think we still want to continue to iterate on this PR to keep it more focused in scope and consistent in its implementation as compared to CertManager to aid in understandability and maintainability of future contributors.
} | ||
|
||
// InstallIstioMinimalWithIngress installs Istio with minimal profile and ingressgateway enabled. | ||
func InstallIstioMinimalWithIngress(namespace string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we should remove this function as nothing is calling it.
// EnsureIstioIngressGateway checks, installs, and waits for istio-ingressgateway | ||
func EnsureIstioIngressGateway() error { | ||
// Check if already installed | ||
if IsIstioIngressGatewayInstalled() { | ||
fmt.Println("Istio ingress gateway is already installed") | ||
} else { | ||
fmt.Println("Installing Istio ingress gateway...") | ||
if err := InstallIstioIngressGateway(); err != nil { | ||
return fmt.Errorf("failed to install istio-ingressgateway: %w", err) | ||
} | ||
} | ||
|
||
// Wait for it to be ready | ||
fmt.Println("Waiting for Istio ingress gateway to be ready...") | ||
if err := WaitIstioIngressGatewayReady(); err != nil { | ||
return fmt.Errorf("istio-ingressgateway failed to become ready: %w", err) | ||
} | ||
|
||
fmt.Println("Istio ingress gateway is ready!") | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should remove this function as nothing is calling it.
// IsIstioIngressGatewayInstalled checks if istio-ingressgateway is installed | ||
func IsIstioIngressGatewayInstalled() bool { | ||
cmd := exec.Command("kubectl", "get", "deployment", | ||
"-n", "istio-system", | ||
"istio-ingressgateway", | ||
"--ignore-not-found") | ||
output, err := Run(cmd) | ||
if err != nil { | ||
return false | ||
} | ||
return len(strings.TrimSpace(output)) > 0 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should remove this function as it does not get invoked presently.
- While
EnsureIstioIngressGateway
does call this function.. nothing callsEnsureIstioIngressGateway
😎
if _, err := Run(cmd); err != nil { | ||
return fmt.Errorf("cert-manager endpoints not ready: %w", err) | ||
} | ||
// First check if cert-manager namespace exists, if not install cert-manager | ||
cmd = exec.Command("kubectl", "get", "namespace", "cert-manager") | ||
if _, err := Run(cmd); err != nil { | ||
// Namespace doesn't exist, install cert-manager | ||
fmt.Println("cert-manager namespace not found, installing cert-manager...") | ||
if err := InstallCertManager(); err != nil { | ||
return fmt.Errorf("failed to install cert-manager: %w", err) | ||
} | ||
} | ||
|
||
// Wait for each CertManager deployment individually by name (most reliable) | ||
deployments := []string{"cert-manager", "cert-manager-cainjector", "cert-manager-webhook"} | ||
|
||
for _, deployment := range deployments { | ||
cmd := exec.Command("kubectl", "wait", "deployment", deployment, | ||
"-n", "cert-manager", | ||
"--for", "condition=Available", | ||
"--timeout", "300s") | ||
|
||
if _, err := Run(cmd); err != nil { | ||
return fmt.Errorf("deployment %s not ready: %w", deployment, err) | ||
} | ||
} | ||
|
||
// Wait for the cert-manager webhook to be ready (critical for functionality) | ||
cmd = exec.Command("kubectl", "wait", "pods", | ||
"-n", "cert-manager", | ||
"-l", "app=webhook", | ||
"--for", "condition=Ready", | ||
"--timeout", "300s") | ||
|
||
if _, err := Run(cmd); err != nil { | ||
return fmt.Errorf("cert-manager webhook pods not ready: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still not comfortable introducing changes to cert-manager
in this PR for following reasons:
- I still don't have a clear understanding of why this is even necessary. If I run
test-e2e
Makefile
target on thenotebooks-v2
branch - I can observecert-manager
being successfully installed - This is "scope creep" in the context of this PR... as the PR title nor PR description mention any changes to cert-manager logic
- please note I do not want the title/description to change here - just highlighting how "shoe-horning" in changes like this can be dangerous/confusing/misleading
- some of this added logic appears redundant to existing logic...
- this existing block of code means its not necessary to iterate through the deployments to individually perform a
wait
- this existing block of code means its not necessary to iterate through the deployments to individually perform a
- this function now itself calls
InstallCertManager()
which is a pretty significant change it how it was working prior
In general - if your cert-manager installation is not working - please reach out to me and lets discuss that failure to resolve it (in a separate PR if necessary). But please don't change the existing logic as part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer the "flow" of the Istio installs follows more closely with the process Cert-Manager follows. While I am aware that there are additional complexities present in Istio as compared to Cert-Manager - I still believe the sequence in which functions are invoked should be similar - and they are significantly different in the current implementation present in this PR.
Consider this Mermaid diagram I generated for Cert-Manager
flowchart TD
A[Test Suite Start] --> B{Check CERT_MANAGER_INSTALL_SKIP env var}
B -->|true| C[Skip Cert Manager Installation]
B -->|false| D[Call utils.IsCertManagerCRDsInstalled]
D --> E{Cert Manager CRDs Already Installed?}
E -->|Yes| F[Set isCertManagerAlreadyInstalled = true]
E -->|No| G[Call utils.InstallCertManager]
G --> H[Remove existing cert-manager leases]
H --> I[Call utils.Run with kubectl delete leases command]
I --> J[Install cert-manager bundle]
J --> K[Call utils.Run with kubectl apply command]
F --> L[Call utils.WaitCertManagerRunning]
K --> L
L --> M[Wait for cert-manager Deployments to be Available]
M --> N[Call utils.Run with kubectl wait for deployments]
N --> O[Wait for cert-manager Endpoints to be ready]
O --> P[Call utils.Run with kubectl wait for endpoints]
P --> Q[Cert Manager Setup Complete]
Q --> R[Test Execution]
R --> S[Test Suite End]
S --> T{Check skipCertManagerInstall && !isCertManagerAlreadyInstalled}
T -->|true| U[Call utils.UninstallCertManager]
T -->|false| V[Skip Teardown]
U --> W[Call utils.Run with kubectl delete command]
W --> X[Cert Manager Teardown Complete]
V --> Y[End]
X --> Y
There is a very linear flow here:
- check for env var
IsCertmanagerCRDInstalled
InstallCertManager
WaitCertManagerRunning
And its important to note that the functions listed above are isolated.. one function does not call the other - rather they are just orchestrated sequentially.
Now, comparing that to what has been implemented for Istio - we see:
flowchart TD
A[Test Suite Start] --> B{ISTIO_INSTALL_SKIP == true?}
B -->|No| C[Check if istioctl is installed]
B -->|Yes| Z[Skip Istio Setup]
C --> D{IsIstioctlInstalled}
D -->|No| E[Install istioctl]
D -->|Yes| F[WARNING: istioctl already installed]
E --> G[InstallIstioctl]
G --> H[Download istioctl binary]
H --> I[Extract and install binary]
I --> J[Move to ~/.local/bin/istioctl]
J --> K[Verify installation]
F --> L[Check istioctl availability]
K --> L
L --> M[WaitIstioAvailable]
M --> N{istio-system namespace exists?}
N -->|No| O[InstallIstio]
N -->|Yes| P[Wait for istiod pods]
O --> Q[Check if istioctl is available]
Q --> R[Check if Istio is already installed]
R --> S[Install Istio with default config]
S --> T[Wait for istiod pods to be ready]
P --> U[Wait for istio-ingressgateway pods]
T --> U
U --> V[Wait for istio-egressgateway pods]
V --> W[Run istioctl analyze]
W --> X[Istio Setup Complete]
X --> Y[Test Execution]
Y --> AA[Test Suite End]
AA --> BB[AfterSuite - Teardown]
BB --> CC[Only CertManager teardown]
CC --> DD[UninstallCertManager]
DD --> EE[End - No Istio teardown]
Z --> L
style A fill:#e1f5fe
style X fill:#c8e6c9
style EE fill:#ffcdd2
style G fill:#fff3e0
style M fill:#f3e5f5
style O fill:#f3e5f5
The biggest difference here is the structure of WaitIstioAvailable
- which as part of its logic actually invokes Installistio
... I would prefer this is structured more in line with how Cert-Manager operates to make it easier for folks to understand how it works...
So i'd expect to see a more clean linear flow like:
- check for env var
IsIstioCtlInstalled
InstallIstioCtl
WaitIstioCtl
<-- this may not be necessary since we are doing a download of a binary - nothing really "async"IsIstioInstalled
InstallIstio
WaitIstioRunning
IsIstioIngressGatewayInstalled
InstallIstioIngressGateway
WaitIstioIngressGatewayRunning
And these functions shouldn't call each other but rather be invoked sequentially in an "orchestrating function"
|
||
}) | ||
|
||
var _ = AfterSuite(func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are missing the logic here in AfterSuite
to actually undo any of the changes the test automation might have take.
Of particular importance is the manner in which these checks work:
if !skipCertManagerInstall && !isCertManagerAlreadyInstalled {
The isXXXAlreadyInstalled
is an important state variable to track for each of the "things" we may have installed:
isiotctl
isiot
istio-ingressgateway
If (and only if) automation installed it - should it be cleaned up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a significant amount of hard-coding of the string istio-system
in this PR... we should pull that out into a const(...)
block so its defined once/centrally
} | ||
|
||
// IsIstioInstalled checks if Istio is installed in the cluster | ||
func IsIstioInstalled() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When checking if istio
is installed - we need to be mindful of cases where a cluster could have it installed - but not in the istio-system namespace
The approach currently employed here only checks istio-system
- which I don't think is robust enough.
The following approach (note: for my convenience its a shell script which would need adapted for go code) can determine which namespace istio is installed in (if its installed):
if kubectl get crd gateways.networking.istio.io &>/dev/null; then
echo "Istio CRDs present"
kubectl get deploy -A -l app=istiod
else
echo "Istio not detected"
fi
We could parse that output if kubectl get deploy -A -l app=istiod
returns a resource to know the actual namespace we are interested in - and then feed that namespace value into the istioctl version
command.
Key Changes