Skip to content

Conversation

yashpal2104
Copy link

@yashpal2104 yashpal2104 commented Sep 5, 2025

Key Changes

  • Adds a comprehensive e2e test suite targeting Istio-enabled scenarios.
  • Tests common installation/uninstallation/wait and verifies istio resources and we will also need "isInstalled" + "install" + "wait" style checks to handle this istio-ingressgateway resource.
  • Adds Cross-platform istioctl installation for testing locally
  • Provides a foundation for future Istio-related test expansion and more robust CI coverage.

@github-project-automation github-project-automation bot moved this to Needs Triage in Kubeflow Notebooks Sep 5, 2025
@google-oss-prow google-oss-prow bot added the area/ci area - related to ci label Sep 5, 2025
@yashpal2104 yashpal2104 marked this pull request as draft September 5, 2025 17:11
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kimwnasptd for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot added the area/controller area - related to controller components label Sep 5, 2025
@google-oss-prow google-oss-prow bot added area/v2 area - version - kubeflow notebooks v2 do-not-merge/work-in-progress size/XL labels Sep 5, 2025
@yashpal2104 yashpal2104 changed the title Istio e2e test suite test: Istio e2e test suite Sep 5, 2025
- Add InstallIstioctl() function to download and install istioctl binary
- Add InstallIstioMinimalWithIngress() to set up Istio with minimal profile
- Add IsIstioInstalled() and WaitIstioctlAvailable() helper functions
- Use positional formatting in URL template for istioctl downloads
- Support configurable Istio namespace for installation
- Add error handling and proper command output redirection

This enables e2e tests to automatically set up Istio service mesh
components required for workspace HTTP proxy functionality.

Signed-off-by: Yash Pal <yashpal2104@gmail.com>
…gement

- Add InstallIstioctl() with OS/arch detection for Linux, macOS, Windows
- Add comprehensive Istio gateway functions (install, check, wait)
- Enhance WaitIstioAvailable() to verify all Istio components
- Add UninstallIstioctl() for complete cleanup
- Use platform-specific download methods with fallbacks

Signed-off-by: Yash Pal <yashpal2104@gmail.com>
@google-oss-prow google-oss-prow bot added size/L and removed size/M labels Sep 7, 2025
…tests

- Add istioctl binary download and installation for multiple platforms
- Implement Istio minimal profile deployment with ingress gateway
- Add comprehensive Istio availability checks and wait functions
- Improve error handling and cleanup for test components
- Add platform-specific binary management for Windows/Linux/macOS

Signed-off-by: Yash Pal <yashpal2104@gmail.com>
Copy link
Contributor

@andyatmiami andyatmiami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start - lets get these issues explained and/or resolved first and I'll continue with review afterwards.

Send me any error logs you may have w.r.t the problems you were having with CertManager (once you undo those changes in this PR) and I can help debug.


if !skipIstioctlInstall {
By("checking if istioctl is installed already")
isIstioctlAlreadyInstalled = utils.IsIstioInstalled()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to be really careful with naming here...

checking if istioctl is "installed" (i.e. binary available) is much different than checking if istio is installed (which presumably would be an istioctl --version check)

I could/would imagine each of those use cases requiring their own helper function...

osName := runtime.GOOS
var rmCmd *exec.Cmd

if osName == "windows" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already numerous places in our Makefile where running on Windows would not work ...

As such - I see no need (for now) to add the complexity of supporting windows here.. since it would presumably fail before this code could/would ever get invoked. So we should simplify this function to remove all Windows support (and anywhere else outside this function for that matter 😎 )


// use LTS version of istioctl
istioctlVersion = "1.27.0"
istioctlURL = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this variable to have an actual value?

Maybe call it istioctlURLTemplate and give it the template value defined here?

return cmd.Run()
}

// TODO:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO: relevant/valid?

}

// InstallIstioMinimalWithIngress installs Istio with minimal profile and ingressgateway enabled.
func InstallIstioMinimalWithIngress(namespace string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything calling this function.. which would explain the error I am getting trying to run the code:

  [FAILED] in [BeforeSuite] - /Users/astonebe/Development/Code/GitHub/kubeflow-notebooks/workspaces/controller/test/e2e/e2e_suite_test.go:106 @ 09/12/25 14:56:46.656
[BeforeSuite] [FAILED] [3.213 seconds]
[BeforeSuite] 
/Users/astonebe/Development/Code/GitHub/kubeflow-notebooks/workspaces/controller/test/e2e/e2e_suite_test.go:58

  [FAILED] istioctl is not available
  Expected success, but got an error:
      <*fmt.wrapError | 0x14000238c80>: 
      istio-system namespace not found: kubectl get namespace istio-system failed with error: (exit status 1) Error from server (NotFound): namespaces "istio-system" not found
      
      {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we should remove this function as nothing is calling it.

Comment on lines 585 to 627

// First check if cert-manager namespace exists, if not install cert-manager
// cmd := exec.Command("kubectl", "get", "namespace", "cert-manager")
// if _, err := Run(cmd); err != nil {
// // Namespace doesn't exist, install cert-manager
// fmt.Println("cert-manager namespace not found, installing cert-manager...")
// if err := InstallCertManager(); err != nil {
// return fmt.Errorf("failed to install cert-manager: %w", err)
// }
// }

// // Wait for the cert-manager namespace to be ready
// cmd = exec.Command("kubectl", "wait", "--for=condition=Ready", "namespace/cert-manager", "--timeout=300s")
// if _, err := Run(cmd); err != nil {
// return fmt.Errorf("cert-manager namespace not ready: %w", err)
// }

// // Wait for each CertManager deployment individually by name (most reliable)
// deployments := []string{"cert-manager", "cert-manager-cainjector", "cert-manager-webhook"}

// for _, deployment := range deployments {
// cmd := exec.Command("kubectl", "wait", "deployment", deployment,
// "-n", "cert-manager",
// "--for", "condition=Available",
// "--timeout", "300s")

// if _, err := Run(cmd); err != nil {
// return fmt.Errorf("deployment %s not ready: %w", deployment, err)
// }
// }

// // Wait for the cert-manager webhook to be ready (critical for functionality)
// cmd = exec.Command("kubectl", "wait", "pods",
// "-n", "cert-manager",
// "-l", "app=webhook",
// "--for", "condition=Ready",
// "--timeout", "300s")

// if _, err := Run(cmd); err != nil {
// return fmt.Errorf("cert-manager webhook pods not ready: %w", err)
// }

// return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely do NOT want to comment this code out.

Comment on lines 81 to 96
@echo "Installing CertManager..."
@if [ "$(CERT_MANAGER_INSTALL_SKIP)" != "true" ]; then \
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.18.2/cert-manager.yaml; \
echo "Waiting for CertManager deployments to be ready..."; \
kubectl wait --for=condition=Available --timeout=300s deployment/cert-manager -n cert-manager; \
kubectl wait --for=condition=Available --timeout=300s deployment/cert-manager-cainjector -n cert-manager; \
kubectl wait --for=condition=Available --timeout=300s deployment/cert-manager-webhook -n cert-manager; \
echo "CertManager is ready!"; \
fi
@echo "Installing Istio..."
@if [ "$(ISTIO_INSTALL_SKIP)" != "true" ]; then \
echo "Istio installation will be handled by the test suite"; \
fi
@if [ "$(PROMETHEUS_INSTALL_SKIP)" != "true" ]; then \
echo "Installing Prometheus (if needed)..."; \
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove this code and rely on logic within go testing infrastructure to perform this setup.

- Add InstallIstio() function with auto-detection
- Modify WaitIstioAvailable() to install if istio-system namespace missing
- Improve Istio/istioctl installation checks
- Fix cert-manager "Stdout already set" error

Resolves "istio-system namespace not found" test failures.

Signed-off-by: Yash Pal <yashpal2104@gmail.com>
Signed-off-by: Yash Pal <yashpal2104@gmail.com>
@google-oss-prow google-oss-prow bot removed the size/XL label Sep 16, 2025
Signed-off-by: Yash Pal <yashpal2104@gmail.com>
@yashpal2104 yashpal2104 marked this pull request as ready for review September 16, 2025 18:58
Copy link
Contributor

@andyatmiami andyatmiami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice progress - I am now able to observe the test-e2e command completing successfully 💯

Ran 1 of 1 Specs in 132.105 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 0 Skipped

However, I think we still want to continue to iterate on this PR to keep it more focused in scope and consistent in its implementation as compared to CertManager to aid in understandability and maintainability of future contributors.

}

// InstallIstioMinimalWithIngress installs Istio with minimal profile and ingressgateway enabled.
func InstallIstioMinimalWithIngress(namespace string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we should remove this function as nothing is calling it.

Comment on lines +455 to +475
// EnsureIstioIngressGateway checks, installs, and waits for istio-ingressgateway
func EnsureIstioIngressGateway() error {
// Check if already installed
if IsIstioIngressGatewayInstalled() {
fmt.Println("Istio ingress gateway is already installed")
} else {
fmt.Println("Installing Istio ingress gateway...")
if err := InstallIstioIngressGateway(); err != nil {
return fmt.Errorf("failed to install istio-ingressgateway: %w", err)
}
}

// Wait for it to be ready
fmt.Println("Waiting for Istio ingress gateway to be ready...")
if err := WaitIstioIngressGatewayReady(); err != nil {
return fmt.Errorf("istio-ingressgateway failed to become ready: %w", err)
}

fmt.Println("Istio ingress gateway is ready!")
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should remove this function as nothing is calling it.

Comment on lines +387 to +398
// IsIstioIngressGatewayInstalled checks if istio-ingressgateway is installed
func IsIstioIngressGatewayInstalled() bool {
cmd := exec.Command("kubectl", "get", "deployment",
"-n", "istio-system",
"istio-ingressgateway",
"--ignore-not-found")
output, err := Run(cmd)
if err != nil {
return false
}
return len(strings.TrimSpace(output)) > 0
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should remove this function as it does not get invoked presently.

  • While EnsureIstioIngressGateway does call this function.. nothing calls EnsureIstioIngressGateway 😎

Comment on lines +591 to +627
if _, err := Run(cmd); err != nil {
return fmt.Errorf("cert-manager endpoints not ready: %w", err)
}
// First check if cert-manager namespace exists, if not install cert-manager
cmd = exec.Command("kubectl", "get", "namespace", "cert-manager")
if _, err := Run(cmd); err != nil {
// Namespace doesn't exist, install cert-manager
fmt.Println("cert-manager namespace not found, installing cert-manager...")
if err := InstallCertManager(); err != nil {
return fmt.Errorf("failed to install cert-manager: %w", err)
}
}

// Wait for each CertManager deployment individually by name (most reliable)
deployments := []string{"cert-manager", "cert-manager-cainjector", "cert-manager-webhook"}

for _, deployment := range deployments {
cmd := exec.Command("kubectl", "wait", "deployment", deployment,
"-n", "cert-manager",
"--for", "condition=Available",
"--timeout", "300s")

if _, err := Run(cmd); err != nil {
return fmt.Errorf("deployment %s not ready: %w", deployment, err)
}
}

// Wait for the cert-manager webhook to be ready (critical for functionality)
cmd = exec.Command("kubectl", "wait", "pods",
"-n", "cert-manager",
"-l", "app=webhook",
"--for", "condition=Ready",
"--timeout", "300s")

if _, err := Run(cmd); err != nil {
return fmt.Errorf("cert-manager webhook pods not ready: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not comfortable introducing changes to cert-manager in this PR for following reasons:

  • I still don't have a clear understanding of why this is even necessary. If I run test-e2e Makefile target on the notebooks-v2 branch - I can observe cert-manager being successfully installed
  • This is "scope creep" in the context of this PR... as the PR title nor PR description mention any changes to cert-manager logic
    • please note I do not want the title/description to change here - just highlighting how "shoe-horning" in changes like this can be dangerous/confusing/misleading
  • some of this added logic appears redundant to existing logic...
    • this existing block of code means its not necessary to iterate through the deployments to individually perform a wait
  • this function now itself calls InstallCertManager() which is a pretty significant change it how it was working prior

In general - if your cert-manager installation is not working - please reach out to me and lets discuss that failure to resolve it (in a separate PR if necessary). But please don't change the existing logic as part of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer the "flow" of the Istio installs follows more closely with the process Cert-Manager follows. While I am aware that there are additional complexities present in Istio as compared to Cert-Manager - I still believe the sequence in which functions are invoked should be similar - and they are significantly different in the current implementation present in this PR.

Consider this Mermaid diagram I generated for Cert-Manager

flowchart TD
    A[Test Suite Start] --> B{Check CERT_MANAGER_INSTALL_SKIP env var}
    B -->|true| C[Skip Cert Manager Installation]
    B -->|false| D[Call utils.IsCertManagerCRDsInstalled]
    
    D --> E{Cert Manager CRDs Already Installed?}
    E -->|Yes| F[Set isCertManagerAlreadyInstalled = true]
    E -->|No| G[Call utils.InstallCertManager]
    
    G --> H[Remove existing cert-manager leases]
    H --> I[Call utils.Run with kubectl delete leases command]
    I --> J[Install cert-manager bundle]
    J --> K[Call utils.Run with kubectl apply command]
    
    F --> L[Call utils.WaitCertManagerRunning]
    K --> L
    
    L --> M[Wait for cert-manager Deployments to be Available]
    M --> N[Call utils.Run with kubectl wait for deployments]
    N --> O[Wait for cert-manager Endpoints to be ready]
    O --> P[Call utils.Run with kubectl wait for endpoints]
    P --> Q[Cert Manager Setup Complete]
    
    Q --> R[Test Execution]
    R --> S[Test Suite End]
    
    S --> T{Check skipCertManagerInstall && !isCertManagerAlreadyInstalled}
    T -->|true| U[Call utils.UninstallCertManager]
    T -->|false| V[Skip Teardown]
    
    U --> W[Call utils.Run with kubectl delete command]
    W --> X[Cert Manager Teardown Complete]
    
    V --> Y[End]
    X --> Y
Loading

There is a very linear flow here:

  • check for env var
  • IsCertmanagerCRDInstalled
  • InstallCertManager
  • WaitCertManagerRunning

And its important to note that the functions listed above are isolated.. one function does not call the other - rather they are just orchestrated sequentially.

Now, comparing that to what has been implemented for Istio - we see:

flowchart TD
    A[Test Suite Start] --> B{ISTIO_INSTALL_SKIP == true?}
    B -->|No| C[Check if istioctl is installed]
    B -->|Yes| Z[Skip Istio Setup]
    
    C --> D{IsIstioctlInstalled}
    D -->|No| E[Install istioctl]
    D -->|Yes| F[WARNING: istioctl already installed]
    
    E --> G[InstallIstioctl]
    G --> H[Download istioctl binary]
    H --> I[Extract and install binary]
    I --> J[Move to ~/.local/bin/istioctl]
    J --> K[Verify installation]
    
    F --> L[Check istioctl availability]
    K --> L
    
    L --> M[WaitIstioAvailable]
    M --> N{istio-system namespace exists?}
    N -->|No| O[InstallIstio]
    N -->|Yes| P[Wait for istiod pods]
    
    O --> Q[Check if istioctl is available]
    Q --> R[Check if Istio is already installed]
    R --> S[Install Istio with default config]
    S --> T[Wait for istiod pods to be ready]
    
    P --> U[Wait for istio-ingressgateway pods]
    T --> U
    
    U --> V[Wait for istio-egressgateway pods]
    V --> W[Run istioctl analyze]
    W --> X[Istio Setup Complete]
    
    X --> Y[Test Execution]
    Y --> AA[Test Suite End]
    
    AA --> BB[AfterSuite - Teardown]
    BB --> CC[Only CertManager teardown]
    CC --> DD[UninstallCertManager]
    DD --> EE[End - No Istio teardown]
    
    Z --> L
    
    style A fill:#e1f5fe
    style X fill:#c8e6c9
    style EE fill:#ffcdd2
    style G fill:#fff3e0
    style M fill:#f3e5f5
    style O fill:#f3e5f5
Loading

The biggest difference here is the structure of WaitIstioAvailable - which as part of its logic actually invokes Installistio... I would prefer this is structured more in line with how Cert-Manager operates to make it easier for folks to understand how it works...

So i'd expect to see a more clean linear flow like:

  • check for env var
  • IsIstioCtlInstalled
  • InstallIstioCtl
  • WaitIstioCtl <-- this may not be necessary since we are doing a download of a binary - nothing really "async"
  • IsIstioInstalled
  • InstallIstio
  • WaitIstioRunning
  • IsIstioIngressGatewayInstalled
  • InstallIstioIngressGateway
  • WaitIstioIngressGatewayRunning

And these functions shouldn't call each other but rather be invoked sequentially in an "orchestrating function"


})

var _ = AfterSuite(func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are missing the logic here in AfterSuite to actually undo any of the changes the test automation might have take.

Of particular importance is the manner in which these checks work:

if !skipCertManagerInstall && !isCertManagerAlreadyInstalled {

The isXXXAlreadyInstalled is an important state variable to track for each of the "things" we may have installed:

  • isiotctl
  • isiot
  • istio-ingressgateway

If (and only if) automation installed it - should it be cleaned up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a significant amount of hard-coding of the string istio-system in this PR... we should pull that out into a const(...) block so its defined once/centrally

}

// IsIstioInstalled checks if Istio is installed in the cluster
func IsIstioInstalled() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When checking if istio is installed - we need to be mindful of cases where a cluster could have it installed - but not in the istio-system namespace

The approach currently employed here only checks istio-system - which I don't think is robust enough.

The following approach (note: for my convenience its a shell script which would need adapted for go code) can determine which namespace istio is installed in (if its installed):

if kubectl get crd gateways.networking.istio.io &>/dev/null; then
  echo "Istio CRDs present"
  kubectl get deploy -A -l app=istiod
else
  echo "Istio not detected"
fi

We could parse that output if kubectl get deploy -A -l app=istiod returns a resource to know the actual namespace we are interested in - and then feed that namespace value into the istioctl version command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ci area - related to ci area/controller area - related to controller components area/v2 area - version - kubeflow notebooks v2 size/L
Projects
Status: Needs Triage
Development

Successfully merging this pull request may close these issues.

2 participants