Conversation
📝 WalkthroughWalkthroughThis pull request integrates Rapid7 storage backend support into the integration test suite. Changes include workflow conditionals to choose between local and storage backends, new helper functions to fetch application profiles from either source, and test updates to support dual-path data retrieval based on backend availability. Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Integration Test
participant Helm as Helm
participant WF as Workflow
participant Cluster as Kubernetes Cluster
participant Storage as Storage Backend
WF->>WF: Check use_storage_backend flag
alt Storage Backend Enabled
WF->>Helm: Install Rapid7 release
Helm->>Cluster: Deploy Rapid7 components
WF->>Test: Set BACKEND_ACCOUNT_ID, BACKEND_ACCESS_KEY
Test->>Test: SetupSuite: isRapid7=true
Test->>Storage: NewStorageClient(credentials)
Test->>Storage: Fetch ApplicationProfile via accountID/accessKey
Storage-->>Test: ApplicationProfile
else Cluster Storage
WF->>Helm: Install Kubescape (standard)
Helm->>Cluster: Deploy Kubescape
Test->>Test: SetupSuite: isRapid7=false
Test->>Cluster: Fetch ApplicationProfile via ksObjectConnection
Cluster-->>Test: ApplicationProfile
end
Test->>Test: verifyApplicationProfileCompleted(accountID, accessKey, isRapid7)
Test->>Test: Execute test assertions
sequenceDiagram
participant Helper as Test Helper
participant isRapid7 as isRapid7 Flag
participant Backend as Storage Backend
participant Cluster as Kubernetes Cluster
Helper->>isRapid7: Check backend type
alt Rapid7 Enabled
Helper->>Backend: fetchApplicationProfileFromStorageBackend(accountID, accessKey)
Helper->>Backend: List ApplicationProfiles by workload-name/kind
Backend-->>Helper: Matching profile
Helper->>Backend: Retrieve specific profile
Backend-->>Helper: ApplicationProfile
else Cluster-based
Helper->>Cluster: fetchApplicationProfileFromCluster(ksObjectConnection)
Cluster-->>Helper: ApplicationProfile from API
end
Helper-->>Helper: Return profile to test
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (11)
tests/integration-test-suite/run-non-failover-tests.sh (1)
25-44:$GOTESTSUM_PATHis detected but never used in the invocation command.The detection block builds and validates
$GOTESTSUM_PATH, but line 65 invokesgotestsumby name rather than"$GOTESTSUM_PATH". It works because PATH is updated in branches 2 and 3, but the variable is effectively dead after the existence check. Use"$GOTESTSUM_PATH"in the invocation for explicitness and to avoid a subtle breakage if PATH manipulation ever changes.♻️ Suggested fix
- gotestsum --format standard-verbose --junitfile "junit-$test.xml" -- -count=1 -timeout 30m -v -run "IntegrationTestSuite/$test$" ./... -- $EXTRA_TEST_ARGS 2>&1 | tee "log-$test.txt" & +"$GOTESTSUM_PATH" --format standard-verbose --junitfile "junit-$test.xml" -- -count=1 -timeout 30m -v -run "IntegrationTestSuite/$test$" ./... -- $EXTRA_TEST_ARGS 2>&1 | tee "log-$test.txt" &🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration-test-suite/run-non-failover-tests.sh` around lines 25 - 44, The script detects and validates GOTESTSUM_PATH but later invokes gotestsum by name; update the invocation to use the validated "$GOTESTSUM_PATH" (quoted) wherever the script runs gotestsum so the resolved binary is used explicitly; keep the existing PATH exports in the branches that set GOPATH/bin or $HOME/go/bin, but change calls like gotestsum ... to "$GOTESTSUM_PATH" ... to avoid relying on PATH and to make the variable non-dead.tests/integration-test-suite/run-failover-tests.sh (1)
25-44: Same$GOTESTSUM_PATHdetection-vs-invocation inconsistency as inrun-non-failover-tests.sh.Line 55 invokes
gotestsumby name instead of"$GOTESTSUM_PATH". The same fix applies here.♻️ Suggested fix
- gotestsum --format standard-verbose --junitfile "junit-$test.xml" -- -count=1 -timeout 30m -v -run "IntegrationTestSuite/$test$" ./... 2>&1 | tee "log-$test.txt" + "$GOTESTSUM_PATH" --format standard-verbose --junitfile "junit-$test.xml" -- -count=1 -timeout 30m -v -run "IntegrationTestSuite/$test$" ./... 2>&1 | tee "log-$test.txt"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration-test-suite/run-failover-tests.sh` around lines 25 - 44, The script detects gotestsum and stores its path in GOTESTSUM_PATH but later invokes gotestsum by name; change any plain invocations of gotestsum to use the discovered "$GOTESTSUM_PATH" variable instead (preserve quoting) so the script always runs the detected binary; update places that call gotestsum (search for bare "gotestsum" invocations) to use "$GOTESTSUM_PATH" and ensure PATH exports remain unchanged.tests/integration-test-suite/go.mod (2)
3-3: Consider pinning to the latestgo 1.25.xpatch for security fixes.
go1.25.6was released 2026-01-15 with security fixes, while the module declaresgo 1.25.0. Upgrading togo 1.25.6(or whatever the latest patch is) picks up all accumulated fixes.♻️ Suggested fix
-go 1.25.0 +go 1.25.6🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration-test-suite/go.mod` at line 3, The module's Go version is pinned to "go 1.25.0" in the go.mod go directive; update that directive to the latest 1.25.x patch (e.g., "go 1.25.6") so the build picks up recent security fixes — locate the go directive in the go.mod file and replace "go 1.25.0" with the current 1.25.patch version.
3-3: Heads-up: Go 1.25 changes defaultGOMAXPROCSin Kubernetes pods.Go 1.25 introduces cgroup-aware GOMAXPROCS on Linux: if the CPU bandwidth limit of the cgroup is lower than the number of logical CPUs,
GOMAXPROCSdefaults to the lower limit — in container runtimes like Kubernetes this corresponds to the pod's CPU limit. This new behavior only takes effect if your program uses Go version 1.25 or higher in yourgo.mod.Integration tests running in constrained pods (e.g., with low CPU limits) may now use fewer goroutines than before, potentially slowing parallel test execution. Verify that the CI runner pods have sufficient CPU limits, or set
GODEBUG=containermaxprocs=0to opt out.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration-test-suite/go.mod` at line 3, The go.mod currently specifies the Go version "go 1.25.0", which enables cgroup-aware GOMAXPROCS in CI pods; to avoid reduced parallelism in integration tests, either change the go directive back to a pre-1.25 version (remove or replace the "go 1.25.0" line) or keep "go 1.25.0" and update CI to set GODEBUG=containermaxprocs=0 (export GODEBUG in your test job environment) or increase the pod CPU limits; locate the Go version directive ("go 1.25.0") in the go.mod and apply one of these fixes..github/workflows/manual-integration-tests.yml (3)
112-122: Quote command substitutions to prevent word splitting (SC2046).Static analysis flags unquoted
$(echo ...)substitutions in the Helm--setarguments. While unlikely to cause issues with well-formed image references, quoting them is defensive and consistent with shell best practices.Proposed fix
helm upgrade --install rapid7 charts/kubescape-operator \ -n kubescape --create-namespace \ - --set nodeAgent.image.repository=$(echo "${{ github.event.inputs.node_agent_image }}" | cut -d: -f1) \ - --set nodeAgent.image.tag=$(echo "${{ github.event.inputs.node_agent_image }}" | cut -d: -f2-) \ - --set storage.image.repository=$(echo "${{ github.event.inputs.storage_image }}" | cut -d: -f1) \ - --set storage.image.tag=$(echo "${{ github.event.inputs.storage_image }}" | cut -d: -f2-) \ + --set nodeAgent.image.repository="$(echo "${{ github.event.inputs.node_agent_image }}" | cut -d: -f1)" \ + --set nodeAgent.image.tag="$(echo "${{ github.event.inputs.node_agent_image }}" | cut -d: -f2-)" \ + --set storage.image.repository="$(echo "${{ github.event.inputs.storage_image }}" | cut -d: -f1)" \ + --set storage.image.tag="$(echo "${{ github.event.inputs.storage_image }}" | cut -d: -f2-)" \ --set nodeAgent.config.learningPeriod=30s \ --set nodeAgent.config.updatePeriod=1m \ --set storage.cleanupInterval=1m \ - $EXTRA_SET_ARGS \ + "$EXTRA_SET_ARGS" \ --waitApply the same fix to the failover job block (lines 275–285).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/manual-integration-tests.yml around lines 112 - 122, The helm upgrade command uses unquoted command substitutions for image repo/tag values (the $(echo "${{ github.event.inputs.node_agent_image }}" | cut -d: -f1), $(echo ... -f2-), and the equivalent storage substitutions); quote those substitutions so the --set arguments become --set nodeAgent.image.repository="$(...)", --set nodeAgent.image.tag="$(...)", --set storage.image.repository="$(...)", and --set storage.image.tag="$(...)" to prevent word-splitting, and apply the same quoting changes to the failover job block where the same substitutions are used.
133-137: Redundant environment variable setup.Lines 133–137 write
BACKEND_ACCOUNT_IDandBACKEND_ACCESS_KEYto$GITHUB_ENV(persisting them for all subsequent steps), and then lines 140–142 also set the same variables via the step-levelenv:block. One or the other is sufficient. Since theenv:block on lines 140–142 covers the test step directly, the$GITHUB_ENVstep can be removed to avoid confusion.The same duplication exists in the failover job (lines 296–305).
Also applies to: 140-142
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/manual-integration-tests.yml around lines 133 - 137, Remove the redundant step that echoes BACKEND_ACCOUNT_ID and BACKEND_ACCESS_KEY into $GITHUB_ENV (the step named "Set storage backend environment variables") and any duplicate echo steps in the failover job; instead rely on the existing step-level env: block that already sets BACKEND_ACCOUNT_ID and BACKEND_ACCESS_KEY for the test step. Locate and delete the echo "BACKEND_ACCOUNT_ID=..." and echo "BACKEND_ACCESS_KEY=..." runs (and their enclosing if: condition) in both the main job and the failover job, leaving the env: entries (the step-level environment variables used by the test step) intact. Ensure no other steps rely on the removed $GITHUB_ENV export—if any do, move those steps to use the step-level env or re-add only the necessary persistence.
95-123: Extract duplicated Helm install logic into a reusable composite action or shared script.The workflow contains 4 near-identical Helm install blocks: one Kubescape and one Rapid7 block in each job (
integration-testsandfailover-tests). All four blocks perform identical git operations, dependency building, and Helm set argument parsing. Only the release name differs (kubescapevsrapid7).Additionally, both Rapid7 and Kubescape installations use identical
--setvalues for nodeAgent and storage configuration. The backend credentials (customer_guid,access_key) are available as workflow inputs and set as environment variables (BACKEND_ACCOUNT_ID,BACKEND_ACCESS_KEY), but they are not passed to the Helm install command itself—verify whether this is intentional or if Helm requires these as explicit--setarguments for proper Rapid7 backend configuration.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/manual-integration-tests.yml around lines 95 - 123, The Helm install logic is duplicated across the integration-tests and failover-tests jobs (the block that builds EXTRA_SET_ARGS, clones helm-charts, runs helm dependency build, and executes helm upgrade --install with the --set nodeAgent.* and storage.* flags); extract this into a reusable composite action or shared script that accepts parameters for release name (kubescape vs rapid7), branch_helm_chart, node_agent_image, storage_image, extra_helm_set_args, and any other per-job toggles, and replace each duplicated block with a single invocation of that action/script that constructs EXTRA_SET_ARGS and runs helm upgrade --install; also verify and, if required, add the BACKEND_ACCOUNT_ID and BACKEND_ACCESS_KEY inputs as explicit --set arguments (e.g., --set backend.customer_guid=... and --set backend.access_key=...) so the Helm chart receives backend credentials when deploying rapid7.tests/integration-test-suite/helpers.go (4)
316-329: Parameter lists are growing unwieldy — consider a struct or receiver method.
verifyApplicationProfileCompletednow takes 9 parameters;verifyNetworkNeighborProfileCompletedtakes 11. This harms readability at every call site and makes future extensions painful. Since most of these values (ksObjectConnection,testNamespace,accountID,accessKey,isRapid7) are already fields onIntegrationTestSuite, these could be methods on the suite instead of free functions:func (s *IntegrationTestSuite) verifyApplicationProfileCompleted(expectedCompleteness, relatedKind, relatedName string) { // access s.ksObjectConnection, s.isRapid7, etc. directly }Also applies to: 361-374
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration-test-suite/helpers.go` around lines 316 - 329, The free function verifyApplicationProfileCompleted (and similarly verifyNetworkNeighborProfileCompleted) takes too many parameters; convert them into receiver methods on IntegrationTestSuite so they use existing suite fields (e.g. s.ksObjectConnection, s.testNamespace, s.accountID, s.accessKey, s.isRapid7) instead of passing those values; update the signature to func (s *IntegrationTestSuite) verifyApplicationProfileCompleted(expectedCompleteness, relatedKind, relatedName string) and adjust the body to reference s.* fields (including calls to fetchApplicationProfileFromStorageBackend/fetchApplicationProfileFromCluster) and then update all call sites to use the method form, doing the same refactor for verifyNetworkNeighborProfileCompleted.
151-151: Hardcoded page size of 100 may miss profiles.
ListApplicationProfilesandListNetworkNeighborhoodspass100as the limit with an empty continuation token. If the namespace accumulates > 100 profiles (unlikely in a clean test namespace, but possible if cleanup fails), the target profile won't be found and the test will report a misleading "not found" error.Consider using a label selector or field selector in the list call to filter by workload-name/kind directly, avoiding pagination issues entirely.
Also applies to: 207-207
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration-test-suite/helpers.go` at line 151, The tests call ListApplicationProfiles and ListNetworkNeighborhoods with a hardcoded limit of 100 and an empty continuation token, which can miss entries; update the calls to either (A) pass a selector/filter (e.g., workload-name/kind label or field selector) into the list call so it only returns matching profiles, or (B) implement proper pagination by looping on the continuation token until nil/empty or the target profile is found; adjust the calls that currently look like ListApplicationProfiles(ctx, testNamespace, 100, "") and ListNetworkNeighborhoods(...) to use the selector/filtered-params or a continuation-token loop so tests never falsely report "not found."
43-54:getClusterNameshells out tokubectl— fragile in CI.This function executes
kubectl config current-contextas a subprocess. Ifkubectlisn't onPATHor the kubeconfig doesn't have a current-context set, it willFataland kill the process (same concern asNewStorageClient). Consider using the GoclientcmdAPI already imported in this file to read the current context programmatically:Suggested fix
func getClusterName() string { - cmd := exec.Command("kubectl", "config", "current-context") - clusterNameBytes, err := cmd.Output() + kubeconfig := os.Getenv("KUBECONFIG") + if kubeconfig == "" { + kubeconfig = os.ExpandEnv("$HOME/.kube/config") + } + config, err := clientcmd.NewNonInteractiveDeferredLoadingClientConfig( + &clientcmd.ClientConfigLoadingRules{ExplicitPath: kubeconfig}, + &clientcmd.ConfigOverrides{}, + ).RawConfig() if err != nil { logger.L().Fatal("failed to get current cluster name", helpers.Error(err)) } - clusterName := string(clusterNameBytes) - // Trim newline (matching helm.go behavior) - clusterName = strings.TrimSpace(clusterName) + clusterName := config.CurrentContext logger.L().Info("Current cluster name", helpers.String("clusterName", clusterName)) return clusterName }This also removes the
os/execimport dependency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration-test-suite/helpers.go` around lines 43 - 54, Replace the shelling out in getClusterName with the Kubernetes clientcmd API: load the kubeconfig with clientcmd (e.g., via clientcmd.NewDefaultClientConfig or NewNonInteractiveDeferredLoadingClientConfig and RawConfig()), read the CurrentContext value from the returned config, handle errors from loading the config and the case of an empty CurrentContext using the same logger calls, trim whitespace as before, and remove the now-unused os/exec import; refer to getClusterName and clientcmd.RawConfig()/CurrentContext to locate where to change the code.
142-173: CacheStorageClientin suite setup to eliminate redundant service-discovery lookups and connections.Both
fetchApplicationProfileFromStorageBackendandfetchNetworkNeighborProfileFromStorageBackendindependently callgetStorageUrl()andNewStorageClient()on every invocation.getStorageUrl()creates a new service discovery client and makes an API call each time, whileNewStorageClient()initializes a new client and establishes a connection.This redundancy is evident in the test suite, where
verifyApplicationProfileCompletedandverifyNetworkNeighborProfileCompletedare called back-to-back in 9 tests (e.g., lines 98–99 in case_sidecar_profile_create_test.go, lines 57–58 in case_simple_profile_create_test.go). Each pair results in duplicate service discovery lookups and client connections for identical credentials.Add a
storageClientfield toIntegrationTestSuiteand initialize it once inSetupSuite()whenisRapid7is true, similar to howksObjectConnectionis managed. Then pass it to these functions instead of recreating it on each call.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration-test-suite/helpers.go` around lines 142 - 173, Add a storageClient field to IntegrationTestSuite and initialize it once in SetupSuite() when isRapid7 is true (same pattern as ksObjectConnection) so tests reuse a single StorageClient instead of recreating it; update fetchApplicationProfileFromStorageBackend and fetchNetworkNeighborProfileFromStorageBackend to accept a StorageClient parameter (or use s.storageClient) and remove calls to getStorageUrl() and NewStorageClient() inside those functions so they use the cached client; ensure SetupSuite() calls getStorageUrl() and NewStorageClient() only once and assigns the result to the new storageClient field for subsequent test helper use.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/manual-integration-tests.yml:
- Around line 26-38: The workflow currently exposes sensitive values via
workflow_dispatch inputs (customer_guid and access_key); remove these plaintext
inputs and instead read credentials from GitHub Secrets (e.g.,
secrets.BACKEND_CUSTOMER_GUID and secrets.BACKEND_ACCESS_KEY) in the workflow,
keep only non-sensitive flags like use_storage_backend, and update any steps
that referenced customer_guid or access_key to use the secrets via the secrets
context (e.g., replace uses of the customer_guid/access_key inputs with
secrets.BACKEND_CUSTOMER_GUID and secrets.BACKEND_ACCESS_KEY).
In
`@tests/integration-test-suite/case_scale_deployment_partial_completion_test.go`:
- Line 69: The intermediate test fetches call fetchApplicationProfileFromCluster
unconditionally but in Rapid7 mode profiles live in the backend; wrap those
intermediate cluster fetches in a conditional so they only run when s.isRapid7
is false (e.g. if !s.isRapid7 { applicationProfile, err :=
fetchApplicationProfileFromCluster(...) } ), or skip the intermediate check when
s.isRapid7 is true, keeping the final calls to verifyApplicationProfileCompleted
and verifyNetworkNeighborProfileCompleted (which already accept isRapid7)
unchanged.
In `@tests/integration-test-suite/helpers.go`:
- Around line 79-89: The helper NewStorageClient currently calls
logger.L().Fatal on errors (logger.L().Fatal in the NewStorageClient function),
which aborts the process and prevents test cleanup; change NewStorageClient to
return (*backendv1.StorageClient, error) instead of aborting: on
backendv1.NewStorageClient or client.Connect errors, return nil and the wrapped
error (with context) rather than calling logger.L().Fatal; update all callers
(tests and helpers that call NewStorageClient) to handle the returned error and
fail the test via t.Fatalf or propagate the error so deferred cleanup/TearDown
logic can run.
- Around line 64-77: The code in getStorageUrl hardcodes the service discovery
host when calling
v3.NewServiceDiscoveryClientV3("api.stage-us-east-1.r7.armo-cadr.com"); change
this to read the host from an environment variable (e.g.,
SERVICE_DISCOVERY_HOST) with the current staging host as the default, and pass
that variable value into v3.NewServiceDiscoveryClientV3; ensure any errors
remain handled the same way and that tests use the env var to target non-staging
environments.
In `@tests/integration-test-suite/integration_test.go`:
- Line 27: Fix the typo in the inline comment that currently reads "These are
only used when storage backend is enbled (Rapid7)" by changing "enbled" to
"enabled" so the comment reads "These are only used when storage backend is
enabled (Rapid7)"; locate the comment string in integration_test.go and update
it accordingly.
In `@tests/integration-test-suite/run-failover-tests.sh`:
- Line 47: The failover test list duplicates a test defined in the non-failover
script; update one of the lists to remove the duplicate: decide whether
TestSimpleProfileNodeAgentFailover belongs in FAILOVER_TESTS (in
run-failover-tests.sh) or NON_FAILOVER_TESTS (in run-non-failover-tests.sh),
then remove it from the other script so each test appears only once; modify the
string assigned to FAILOVER_TESTS or NON_FAILOVER_TESTS accordingly (e.g.,
delete or replace "TestSimpleProfileNodeAgentFailover" from FAILOVER_TESTS) and
ensure the remaining test names reflect the intended grouping.
In `@tests/integration-test-suite/run-non-failover-tests.sh`:
- Line 47: The NON_FAILOVER_TESTS variable currently lists a failover test
(TestSimpleProfileNodeAgentFailover) likely due to copy/paste; update the value
of NON_FAILOVER_TESTS to include the actual non-failover test names (for example
TestJobProfileCreate, TestCronJobProfileCreate,
TestCrashLoopAndStableProfileIncomplete) and remove
TestSimpleProfileNodeAgentFailover so the non-failover script runs the correct
suite; locate the assignment to NON_FAILOVER_TESTS in run-non-failover-tests.sh
and replace the test list accordingly.
---
Nitpick comments:
In @.github/workflows/manual-integration-tests.yml:
- Around line 112-122: The helm upgrade command uses unquoted command
substitutions for image repo/tag values (the $(echo "${{
github.event.inputs.node_agent_image }}" | cut -d: -f1), $(echo ... -f2-), and
the equivalent storage substitutions); quote those substitutions so the --set
arguments become --set nodeAgent.image.repository="$(...)", --set
nodeAgent.image.tag="$(...)", --set storage.image.repository="$(...)", and --set
storage.image.tag="$(...)" to prevent word-splitting, and apply the same quoting
changes to the failover job block where the same substitutions are used.
- Around line 133-137: Remove the redundant step that echoes BACKEND_ACCOUNT_ID
and BACKEND_ACCESS_KEY into $GITHUB_ENV (the step named "Set storage backend
environment variables") and any duplicate echo steps in the failover job;
instead rely on the existing step-level env: block that already sets
BACKEND_ACCOUNT_ID and BACKEND_ACCESS_KEY for the test step. Locate and delete
the echo "BACKEND_ACCOUNT_ID=..." and echo "BACKEND_ACCESS_KEY=..." runs (and
their enclosing if: condition) in both the main job and the failover job,
leaving the env: entries (the step-level environment variables used by the test
step) intact. Ensure no other steps rely on the removed $GITHUB_ENV export—if
any do, move those steps to use the step-level env or re-add only the necessary
persistence.
- Around line 95-123: The Helm install logic is duplicated across the
integration-tests and failover-tests jobs (the block that builds EXTRA_SET_ARGS,
clones helm-charts, runs helm dependency build, and executes helm upgrade
--install with the --set nodeAgent.* and storage.* flags); extract this into a
reusable composite action or shared script that accepts parameters for release
name (kubescape vs rapid7), branch_helm_chart, node_agent_image, storage_image,
extra_helm_set_args, and any other per-job toggles, and replace each duplicated
block with a single invocation of that action/script that constructs
EXTRA_SET_ARGS and runs helm upgrade --install; also verify and, if required,
add the BACKEND_ACCOUNT_ID and BACKEND_ACCESS_KEY inputs as explicit --set
arguments (e.g., --set backend.customer_guid=... and --set
backend.access_key=...) so the Helm chart receives backend credentials when
deploying rapid7.
In `@tests/integration-test-suite/go.mod`:
- Line 3: The module's Go version is pinned to "go 1.25.0" in the go.mod go
directive; update that directive to the latest 1.25.x patch (e.g., "go 1.25.6")
so the build picks up recent security fixes — locate the go directive in the
go.mod file and replace "go 1.25.0" with the current 1.25.patch version.
- Line 3: The go.mod currently specifies the Go version "go 1.25.0", which
enables cgroup-aware GOMAXPROCS in CI pods; to avoid reduced parallelism in
integration tests, either change the go directive back to a pre-1.25 version
(remove or replace the "go 1.25.0" line) or keep "go 1.25.0" and update CI to
set GODEBUG=containermaxprocs=0 (export GODEBUG in your test job environment) or
increase the pod CPU limits; locate the Go version directive ("go 1.25.0") in
the go.mod and apply one of these fixes.
In `@tests/integration-test-suite/helpers.go`:
- Around line 316-329: The free function verifyApplicationProfileCompleted (and
similarly verifyNetworkNeighborProfileCompleted) takes too many parameters;
convert them into receiver methods on IntegrationTestSuite so they use existing
suite fields (e.g. s.ksObjectConnection, s.testNamespace, s.accountID,
s.accessKey, s.isRapid7) instead of passing those values; update the signature
to func (s *IntegrationTestSuite)
verifyApplicationProfileCompleted(expectedCompleteness, relatedKind, relatedName
string) and adjust the body to reference s.* fields (including calls to
fetchApplicationProfileFromStorageBackend/fetchApplicationProfileFromCluster)
and then update all call sites to use the method form, doing the same refactor
for verifyNetworkNeighborProfileCompleted.
- Line 151: The tests call ListApplicationProfiles and ListNetworkNeighborhoods
with a hardcoded limit of 100 and an empty continuation token, which can miss
entries; update the calls to either (A) pass a selector/filter (e.g.,
workload-name/kind label or field selector) into the list call so it only
returns matching profiles, or (B) implement proper pagination by looping on the
continuation token until nil/empty or the target profile is found; adjust the
calls that currently look like ListApplicationProfiles(ctx, testNamespace, 100,
"") and ListNetworkNeighborhoods(...) to use the selector/filtered-params or a
continuation-token loop so tests never falsely report "not found."
- Around line 43-54: Replace the shelling out in getClusterName with the
Kubernetes clientcmd API: load the kubeconfig with clientcmd (e.g., via
clientcmd.NewDefaultClientConfig or NewNonInteractiveDeferredLoadingClientConfig
and RawConfig()), read the CurrentContext value from the returned config, handle
errors from loading the config and the case of an empty CurrentContext using the
same logger calls, trim whitespace as before, and remove the now-unused os/exec
import; refer to getClusterName and clientcmd.RawConfig()/CurrentContext to
locate where to change the code.
- Around line 142-173: Add a storageClient field to IntegrationTestSuite and
initialize it once in SetupSuite() when isRapid7 is true (same pattern as
ksObjectConnection) so tests reuse a single StorageClient instead of recreating
it; update fetchApplicationProfileFromStorageBackend and
fetchNetworkNeighborProfileFromStorageBackend to accept a StorageClient
parameter (or use s.storageClient) and remove calls to getStorageUrl() and
NewStorageClient() inside those functions so they use the cached client; ensure
SetupSuite() calls getStorageUrl() and NewStorageClient() only once and assigns
the result to the new storageClient field for subsequent test helper use.
In `@tests/integration-test-suite/run-failover-tests.sh`:
- Around line 25-44: The script detects gotestsum and stores its path in
GOTESTSUM_PATH but later invokes gotestsum by name; change any plain invocations
of gotestsum to use the discovered "$GOTESTSUM_PATH" variable instead (preserve
quoting) so the script always runs the detected binary; update places that call
gotestsum (search for bare "gotestsum" invocations) to use "$GOTESTSUM_PATH" and
ensure PATH exports remain unchanged.
In `@tests/integration-test-suite/run-non-failover-tests.sh`:
- Around line 25-44: The script detects and validates GOTESTSUM_PATH but later
invokes gotestsum by name; update the invocation to use the validated
"$GOTESTSUM_PATH" (quoted) wherever the script runs gotestsum so the resolved
binary is used explicitly; keep the existing PATH exports in the branches that
set GOPATH/bin or $HOME/go/bin, but change calls like gotestsum ... to
"$GOTESTSUM_PATH" ... to avoid relying on PATH and to make the variable
non-dead.
| use_storage_backend: | ||
| description: 'Use storage backend (Rapid7) instead of local storage' | ||
| required: false | ||
| type: boolean | ||
| default: false | ||
| customer_guid: | ||
| description: 'Customer GUID (Account ID) for storage backend' | ||
| required: false | ||
| default: '' | ||
| access_key: | ||
| description: 'Access key for storage backend' | ||
| required: false | ||
| default: '' |
There was a problem hiding this comment.
Sensitive credentials exposed as workflow dispatch inputs.
customer_guid and access_key are defined as plaintext workflow_dispatch inputs. These values will be visible in the GitHub Actions UI (workflow run parameters) and potentially in logs. Credentials should be stored as GitHub repository or environment secrets and referenced via ${{ secrets.BACKEND_ACCESS_KEY }}, not passed through the dispatch form.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/manual-integration-tests.yml around lines 26 - 38, The
workflow currently exposes sensitive values via workflow_dispatch inputs
(customer_guid and access_key); remove these plaintext inputs and instead read
credentials from GitHub Secrets (e.g., secrets.BACKEND_CUSTOMER_GUID and
secrets.BACKEND_ACCESS_KEY) in the workflow, keep only non-sensitive flags like
use_storage_backend, and update any steps that referenced customer_guid or
access_key to use the secrets via the secrets context (e.g., replace uses of the
customer_guid/access_key inputs with secrets.BACKEND_CUSTOMER_GUID and
secrets.BACKEND_ACCESS_KEY).
|
|
||
| // Verify that profile exists and is partial | ||
| applicationProfile, err := fetchApplicationProfile(s.ksObjectConnection, s.testNamespace, "deployment", "scale-partial-test-deployment") | ||
| applicationProfile, err := fetchApplicationProfileFromCluster(s.ksObjectConnection, s.testNamespace, "deployment", "scale-partial-test-deployment") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's look at the actual file and the lines mentioned
echo "=== File structure and content around lines 69, 92, 103, 107 ==="
cd tests/integration-test-suite || exit 1
wc -l case_scale_deployment_partial_completion_test.go
head -110 case_scale_deployment_partial_completion_test.go | tail -50Repository: kubescape/storage
Length of output: 3054
🏁 Script executed:
#!/bin/bash
# Check how other test files handle fetchApplicationProfileFromCluster
echo "=== Pattern of fetchApplicationProfileFromCluster across test files ==="
rg -n 'fetchApplicationProfileFromCluster' --type=go -B2 -A2 tests/integration-test-suite/ | head -100Repository: kubescape/storage
Length of output: 6740
🏁 Script executed:
#!/bin/bash
# Check the integration test setup to understand isRapid7
echo "=== Check integration_test.go for isRapid7 context ==="
rg -n 'isRapid7' --type=go tests/integration-test-suite/ | head -30Repository: kubescape/storage
Length of output: 6308
Lines 69 and 92 unconditionally fetch from cluster while final verifications properly respect isRapid7 mode.
The intermediate profile fetches at lines 69 and 92 always call fetchApplicationProfileFromCluster regardless of the isRapid7 setting. In contrast, the final verifications at lines 103 and 107 correctly pass isRapid7 to verifyApplicationProfileCompleted and verifyNetworkNeighborProfileCompleted, which internally handle storage backend routing.
Other test files follow the correct pattern:
case_statefulset_profile_cleanup_test.gowrapsfetchApplicationProfileFromClusterwithif !s.isRapid7case_init_sidecar_profile_create_test.go,case_crashloop_profile_incomplete_test.go, and similar tests conditionally call the storage backend whenisRapid7is true
When running in Rapid7 mode where profiles are stored in the backend, these unconditional cluster-based fetches will fail. Add conditional branching at lines 69 and 92 to check isRapid7 before fetching from the cluster, or skip these intermediate checks in Rapid7 mode.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@tests/integration-test-suite/case_scale_deployment_partial_completion_test.go`
at line 69, The intermediate test fetches call
fetchApplicationProfileFromCluster unconditionally but in Rapid7 mode profiles
live in the backend; wrap those intermediate cluster fetches in a conditional so
they only run when s.isRapid7 is false (e.g. if !s.isRapid7 {
applicationProfile, err := fetchApplicationProfileFromCluster(...) } ), or skip
the intermediate check when s.isRapid7 is true, keeping the final calls to
verifyApplicationProfileCompleted and verifyNetworkNeighborProfileCompleted
(which already accept isRapid7) unchanged.
| s.LogWithTimestamp("Killing node agent pod on same node as test pod for failover test") | ||
| DeleteNodeAgentPodOnSameNode(s.T(), s.clientset, s.testNamespace, "app=nodeagent-failover-test-deployment") | ||
| time.Sleep(10 * time.Second) | ||
| //DeleteNodeAgentPodOnSameNode(s.T(), s.clientset, s.testNamespace, "app=nodeagent-failover-test-deployment") | ||
|
|
||
| s.LogWithTimestamp("Waiting 4 more minutes for learning period to complete after failover") | ||
| time.Sleep(4 * time.Minute) | ||
|
|
||
| s.LogWithTimestamp("Verifying profiles are complete after failover") | ||
| verifyApplicationProfileCompleted(s.T(), s.ksObjectConnection, "partial", s.testNamespace, "deployment", "nodeagent-failover-test-deployment") | ||
| verifyNetworkNeighborProfileCompleted(s.T(), s.ksObjectConnection, false, false, "partial", s.testNamespace, "deployment", "nodeagent-failover-test-deployment") | ||
| verifyApplicationProfileCompleted(s.T(), s.ksObjectConnection, "partial", s.testNamespace, "deployment", "nodeagent-failover-test-deployment", s.accountID, s.accessKey, s.isRapid7) | ||
| verifyNetworkNeighborProfileCompleted(s.T(), s.ksObjectConnection, false, false, "partial", s.testNamespace, "deployment", "nodeagent-failover-test-deployment", s.accountID, s.accessKey, s.isRapid7) |
There was a problem hiding this comment.
Commented-out pod deletion breaks the test's core scenario and invalidates the status assertion.
Four cascading issues all stem from Line 58 being commented out:
- The failover scenario is never exercised.
DeleteNodeAgentPodOnSameNodeis never called, so the test no longer validates recovery from a node-agent disruption. A 10-second sleep is not a meaningful substitute. - Log message (Line 56) is misleading.
"Killing node agent pod on same node as test pod for failover test"is emitted even though nothing is killed, making test logs untrustworthy for failure diagnosis. - Test description (Lines 13-14) is stale. It still describes the killing step ("kills the node-agent pod on the same node as the test pod after 5 minutes") which no longer happens.
- Status assertion
"partial"(Lines 64-65) is likely incorrect without a disruption. With an uninterrupted node-agent running the full learning period, profiles are expected to complete fully, not remain"partial". This assertion could produce a spurious pass (if the function accepts either state) or a spurious failure.
If this is intentionally skipped for a specific internal environment, the sleep at Line 57 should be removed and the test should at minimum be marked with a t.Skip(...) or a conditional guard, and the description/log should be updated. Otherwise, the deletion should be restored.
🛠️ Suggested minimal fix if intent is to conditionally skip the kill step
s.LogWithTimestamp("Killing node agent pod on same node as test pod for failover test")
-time.Sleep(10 * time.Second)
-//DeleteNodeAgentPodOnSameNode(s.T(), s.clientset, s.testNamespace, "app=nodeagent-failover-test-deployment")
+if !s.isRapid7 {
+ DeleteNodeAgentPodOnSameNode(s.T(), s.clientset, s.testNamespace, "app=nodeagent-failover-test-deployment")
+} else {
+ s.T().Log("Skipping node-agent pod deletion: not supported in Rapid7 environment")
+}And update the expected status on Lines 64-65 to account for whether a disruption actually occurred:
-verifyApplicationProfileCompleted(s.T(), s.ksObjectConnection, "partial", ...)
+expectedStatus := "partial"
+if s.isRapid7 {
+ expectedStatus = "complete"
+}
+verifyApplicationProfileCompleted(s.T(), s.ksObjectConnection, expectedStatus, ...)| func getStorageUrl() (string, error) { | ||
| serviceDiscoveryClient, err := v3.NewServiceDiscoveryClientV3("api.stage-us-east-1.r7.armo-cadr.com") | ||
| if err != nil { | ||
| return "", fmt.Errorf("failed to create service discovery client: %v", err) | ||
| } | ||
|
|
||
| services, err := servicediscovery.GetServices(serviceDiscoveryClient) | ||
| if err != nil { | ||
| return "", fmt.Errorf("failed to get services: %v", err) | ||
| } | ||
|
|
||
| storageUrl := services.GetStorageUrl() | ||
| return storageUrl, nil | ||
| } |
There was a problem hiding this comment.
Hardcoded staging URL will break in non-staging environments.
The service discovery URL api.stage-us-east-1.r7.armo-cadr.com on Line 65 is hardcoded. If these tests are ever run against production or a different staging region, they'll silently connect to the wrong backend. Consider making this configurable via an environment variable with this as a default.
Suggested fix
func getStorageUrl() (string, error) {
- serviceDiscoveryClient, err := v3.NewServiceDiscoveryClientV3("api.stage-us-east-1.r7.armo-cadr.com")
+ sdHost := os.Getenv("SERVICE_DISCOVERY_HOST")
+ if sdHost == "" {
+ sdHost = "api.stage-us-east-1.r7.armo-cadr.com"
+ }
+ serviceDiscoveryClient, err := v3.NewServiceDiscoveryClientV3(sdHost)
if err != nil {
return "", fmt.Errorf("failed to create service discovery client: %v", err)
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integration-test-suite/helpers.go` around lines 64 - 77, The code in
getStorageUrl hardcodes the service discovery host when calling
v3.NewServiceDiscoveryClientV3("api.stage-us-east-1.r7.armo-cadr.com"); change
this to read the host from an environment variable (e.g.,
SERVICE_DISCOVERY_HOST) with the current staging host as the default, and pass
that variable value into v3.NewServiceDiscoveryClientV3; ensure any errors
remain handled the same way and that tests use the env var to target non-staging
environments.
| func NewStorageClient(clusterName, accountID, accessKey, url string) *backendv1.StorageClient { | ||
| client, err := backendv1.NewStorageClient(url, accountID, accessKey, clusterName) | ||
| if err != nil { | ||
| logger.L().Fatal("failed to create storage client", helpers.Error(err)) | ||
| } | ||
| err = client.Connect() | ||
| if err != nil { | ||
| logger.L().Fatal("failed to connect to storage backend", helpers.Error(err)) | ||
| } | ||
| return client | ||
| } |
There was a problem hiding this comment.
logger.L().Fatal aborts the process without test cleanup.
Fatal-level log calls in most Go logging frameworks (including go-logger) call os.Exit(1), which bypasses t.Cleanup, deferred teardown, and testify suite TearDownTest/TearDownSuite. This means namespaces, deployments, and Helm releases created by the test won't be cleaned up on failure.
Return an error instead and let callers (or the verification functions) decide how to fail.
Suggested fix
-func NewStorageClient(clusterName, accountID, accessKey, url string) *backendv1.StorageClient {
+func NewStorageClient(clusterName, accountID, accessKey, url string) (*backendv1.StorageClient, error) {
client, err := backendv1.NewStorageClient(url, accountID, accessKey, clusterName)
if err != nil {
- logger.L().Fatal("failed to create storage client", helpers.Error(err))
+ return nil, fmt.Errorf("failed to create storage client: %w", err)
}
err = client.Connect()
if err != nil {
- logger.L().Fatal("failed to connect to storage backend", helpers.Error(err))
+ return nil, fmt.Errorf("failed to connect to storage backend: %w", err)
}
- return client
+ return client, nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func NewStorageClient(clusterName, accountID, accessKey, url string) *backendv1.StorageClient { | |
| client, err := backendv1.NewStorageClient(url, accountID, accessKey, clusterName) | |
| if err != nil { | |
| logger.L().Fatal("failed to create storage client", helpers.Error(err)) | |
| } | |
| err = client.Connect() | |
| if err != nil { | |
| logger.L().Fatal("failed to connect to storage backend", helpers.Error(err)) | |
| } | |
| return client | |
| } | |
| func NewStorageClient(clusterName, accountID, accessKey, url string) (*backendv1.StorageClient, error) { | |
| client, err := backendv1.NewStorageClient(url, accountID, accessKey, clusterName) | |
| if err != nil { | |
| return nil, fmt.Errorf("failed to create storage client: %w", err) | |
| } | |
| err = client.Connect() | |
| if err != nil { | |
| return nil, fmt.Errorf("failed to connect to storage backend: %w", err) | |
| } | |
| return client, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integration-test-suite/helpers.go` around lines 79 - 89, The helper
NewStorageClient currently calls logger.L().Fatal on errors (logger.L().Fatal in
the NewStorageClient function), which aborts the process and prevents test
cleanup; change NewStorageClient to return (*backendv1.StorageClient, error)
instead of aborting: on backendv1.NewStorageClient or client.Connect errors,
return nil and the wrapped error (with context) rather than calling
logger.L().Fatal; update all callers (tests and helpers that call
NewStorageClient) to handle the returned error and fail the test via t.Fatalf or
propagate the error so deferred cleanup/TearDown logic can run.
| clientset *kubernetes.Clientset | ||
| ksObjectConnection spdxv1beta1.SpdxV1beta1Interface | ||
| isRapid7 bool | ||
| // These are only used when storage backend is enbled (Rapid7) |
There was a problem hiding this comment.
Typo in comment: "enbled" → "enabled".
- // These are only used when storage backend is enbled (Rapid7)
+ // These are only used when storage backend is enabled (Rapid7)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // These are only used when storage backend is enbled (Rapid7) | |
| // These are only used when storage backend is enabled (Rapid7) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integration-test-suite/integration_test.go` at line 27, Fix the typo in
the inline comment that currently reads "These are only used when storage
backend is enbled (Rapid7)" by changing "enbled" to "enabled" so the comment
reads "These are only used when storage backend is enabled (Rapid7)"; locate the
comment string in integration_test.go and update it accordingly.
|
|
||
| # List of failover tests to run sequentially | ||
| FAILOVER_TESTS="TestSimpleProfileStorageFailover TestSimpleProfileNodeAgentFailover TestLongStorageFailover TestScaleDeploymentPartialCompletion" | ||
| FAILOVER_TESTS="TestSimpleProfileNodeAgentFailover" |
There was a problem hiding this comment.
Same test name as run-non-failover-tests.sh — duplicate test execution in CI.
FAILOVER_TESTS="TestSimpleProfileNodeAgentFailover" matches NON_FAILOVER_TESTS="TestSimpleProfileNodeAgentFailover" in the companion script. Whichever of these two is the intentional owner of this test, the other should be updated to contain its respective tests. See the related comment on run-non-failover-tests.sh.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integration-test-suite/run-failover-tests.sh` at line 47, The failover
test list duplicates a test defined in the non-failover script; update one of
the lists to remove the duplicate: decide whether
TestSimpleProfileNodeAgentFailover belongs in FAILOVER_TESTS (in
run-failover-tests.sh) or NON_FAILOVER_TESTS (in run-non-failover-tests.sh),
then remove it from the other script so each test appears only once; modify the
string assigned to FAILOVER_TESTS or NON_FAILOVER_TESTS accordingly (e.g.,
delete or replace "TestSimpleProfileNodeAgentFailover" from FAILOVER_TESTS) and
ensure the remaining test names reflect the intended grouping.
|
|
||
| # List of non-failover tests to run in parallel | ||
| NON_FAILOVER_TESTS="TestSimpleProfileCreate TestJobProfileCreate TestCronJobProfileCreate TestInitContainerProfileCreate TestSidecarProfileCreate TestInitSidecarProfileCreate TestCrashLoopProfileIncomplete TestCrashLoopAndStableProfileIncomplete TestStatefulSetProfileCleanup" | ||
| NON_FAILOVER_TESTS="TestSimpleProfileNodeAgentFailover" |
There was a problem hiding this comment.
NON_FAILOVER_TESTS contains a failover test — likely a copy-paste error.
TestSimpleProfileNodeAgentFailover is identical to what run-failover-tests.sh runs (line 47 of that file). Running both scripts in CI will execute the same test twice, and the NON_FAILOVER_TESTS variable name contradicts its contents. The non-failover script should contain the non-failover tests (e.g., TestJobProfileCreate, TestCronJobProfileCreate, TestCrashLoopAndStableProfileIncomplete).
🐛 Suggested fix
-NON_FAILOVER_TESTS="TestSimpleProfileNodeAgentFailover"
+NON_FAILOVER_TESTS="TestJobProfileCreate TestCronJobProfileCreate TestCrashLoopAndStableProfileIncomplete"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| NON_FAILOVER_TESTS="TestSimpleProfileNodeAgentFailover" | |
| NON_FAILOVER_TESTS="TestJobProfileCreate TestCronJobProfileCreate TestCrashLoopAndStableProfileIncomplete" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integration-test-suite/run-non-failover-tests.sh` at line 47, The
NON_FAILOVER_TESTS variable currently lists a failover test
(TestSimpleProfileNodeAgentFailover) likely due to copy/paste; update the value
of NON_FAILOVER_TESTS to include the actual non-failover test names (for example
TestJobProfileCreate, TestCronJobProfileCreate,
TestCrashLoopAndStableProfileIncomplete) and remove
TestSimpleProfileNodeAgentFailover so the non-failover script runs the correct
suite; locate the assignment to NON_FAILOVER_TESTS in run-non-failover-tests.sh
and replace the test list accordingly.
|
Summary:
|
Sorry, we do not accept changes directly against this repository. Please see
CONTRIBUTING.md for information on where and how to contribute instead.
Summary by CodeRabbit
New Features
Chores