Skip to content

feat: add SBOM scanner sidecar for memory-isolated SBOM generation#753

Open
slashben wants to merge 4 commits intomainfrom
feature/sbom-scanner-sidecar
Open

feat: add SBOM scanner sidecar for memory-isolated SBOM generation#753
slashben wants to merge 4 commits intomainfrom
feature/sbom-scanner-sidecar

Conversation

@slashben
Copy link
Contributor

@slashben slashben commented Mar 18, 2026

Summary

  • SBOM scanner sidecar: Moves Syft-based SBOM generation into a separate gRPC sidecar container within the node-agent pod, preventing OOM kills from crashing the main node-agent process
  • Shared helpers refactor: Extracts NodeSource, NodeResolver, and toSyftDocument into a syftutil subpackage to avoid import cycles between the main agent and the sidecar binary
  • Full fallback support: When the sidecar is not configured, the existing in-process Syft scanning continues to work; when the sidecar is configured but not ready, scans are deferred

Key changes

Area Files
Protobuf service pkg/sbomscanner/v1/proto/scanner.protoCreateSBOM and Health RPCs
gRPC server pkg/sbomscanner/v1/server.go — sequential processing, timeout, image-too-large handling
gRPC client pkg/sbomscanner/v1/client.go — reconnection backoff, health checks, crash detection
Scanner binary cmd/sbom-scanner/main.go — Unix socket listener, graceful shutdown
SbomManager pkg/sbommanager/v1/sbom_manager.go — sidecar/not-ready/fallback branching, retry tracking, TooLarge-with-memory-limit annotations
Observability pkg/sbommanager/v1/metrics.gosbom_scan_total, sbom_scan_duration_seconds, sbom_scanner_restarts_total, sbom_scanner_ready
Shared helpers pkg/sbommanager/v1/syftutil/ — extracted source.go, resolver.go, document.go, helpers.go
Dockerfile build/Dockerfile — builds and copies both node-agent and sbom-scanner binaries
Wiring cmd/main.go — creates scanner client from SBOM_SCANNER_SOCKET env var

Test plan

  • Unit tests for gRPC server (health, invalid input, image-too-large, cancelled context)
  • Integration tests (full scan lifecycle, simulated OOM/crash, image-too-large via sidecar, ready check)
  • Existing sbommanager tests pass (source_test, sbom_manager_test)
  • Syftutil package tests pass (toLayers test moved from sbommanager)
  • Full go build ./... passes
  • CI pipeline
  • Staging deployment with Helm chart changes (companion PR)

Made with Cursor

Summary by CodeRabbit

  • New Features

    • SBOM generation can be offloaded to a dedicated scanner sidecar with readiness/health checks, timeouts, and in-process fallback.
    • Runtime image now includes the sbom-scanner executable.
    • Prometheus metrics for scan counts, durations, readiness, and scanner restarts.
    • Configurable scanner memory limits, image/SBOM size limits, automatic crash detection and retry logic.
  • Tests

    • End-to-end and integration tests covering scanner lifecycle, error scenarios, resource limits, and readiness.

Move Syft-based SBOM scanning into a separate gRPC sidecar container
to prevent OOM kills from crashing the main node-agent process. The
sidecar communicates via Unix domain socket and follows the existing
ClamAV sidecar pattern.

- Define protobuf service (CreateSBOM, Health RPCs)
- Extract shared Syft helpers into syftutil subpackage
- Implement gRPC server with sequential scan processing
- Create sbom-scanner binary entrypoint
- Implement gRPC client with crash detection and backoff
- Add sidecar/not-ready/in-process fallback branching in SbomManager
- Add retry tracking with TooLarge-with-memory-limit annotations
- Add Prometheus metrics (scan_total, duration, restarts, ready)
- Update Dockerfile to build both node-agent and sbom-scanner binaries
- Add unit and integration tests

Made-with: Cursor
Signed-off-by: Ben <ben@armosec.io>
@coderabbitai
Copy link

coderabbitai bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

Adds an SBOM scanner sidecar (gRPC server + Unix-socket client), integrates it into the node-agent to delegate SBOM generation with an in-process Syft fallback, introduces protobufs, metrics, syft->SyftDocument conversion utilities, tests, and a Docker build step to include the sbom-scanner binary.

Changes

Cohort / File(s) Summary
Docker Build Configuration
build/Dockerfile
Builds a new sbom-scanner binary in the builder stage and copies it into the final image.
Node Agent Entry
cmd/main.go
Initializes SBOMScannerClient from SBOM_SCANNER_SOCKET env and passes it to CreateSbomManager when SBOM generation is enabled (signature updated).
Sidecar Binary
cmd/sbom-scanner/main.go
New executable: starts gRPC server on a Unix socket, registers scanner server, handles SIGTERM/SIGINT for graceful shutdown.
SBOM Manager Core
pkg/sbommanager/v1/sbom_manager.go, pkg/sbommanager/v1/metrics.go
SbomManager gains scannerClient, memory-limit and retry fields; delegates scans to sidecar when ready, handles crashes/retries/limits, records Prometheus metrics, and falls back to in-process Syft generation.
Syft Conversion Utilities
pkg/sbommanager/v1/syftutil/document.go, .../helpers.go, .../resolver.go, .../source.go, .../source_test.go
Adds ToSyftDocument and helper conversions; package for these files renamed to syftutil; tests adjusted.
Scanner Protobufs & Gen Config
pkg/sbomscanner/v1/proto/scanner.proto, pkg/sbomscanner/v1/proto/buf.gen.yaml
Adds SBOMScanner service (CreateSBOM, Health) and messages; buf generation configuration for Go codegen.
Scanner Types & API
pkg/sbomscanner/v1/types.go
Defines ScanRequest/ScanResult, SBOMScannerClient interface, and errors ErrScannerCrashed, ErrScannerNotReady.
Scanner Client
pkg/sbomscanner/v1/client.go
gRPC Unix-socket client with backoff health checks, CreateSBOM mapping to protobuf, Ready and Close semantics, and scanner-crash error mapping.
Scanner Server
pkg/sbomscanner/v1/server.go
Server implementation: CreateSBOM (image status parsing, size checks, Syft invocation, timeout handling) and Health RPC; maps errors to proper gRPC codes.
Scanner Tests
pkg/sbomscanner/v1/server_test.go, pkg/sbomscanner/v1/integration_test.go
Unit and integration tests for health, lifecycle, simulated crash/OOM, image-too-large, context cancellation, and end-to-end CreateSBOM flows.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as Node Agent
    participant ScannerClient as SBOMScannerClient
    participant GRPC as gRPC (Unix Socket)
    participant Scanner as SBOM Scanner Server
    participant Storage as Storage

    Agent->>ScannerClient: Ready()?
    alt Ready
        Agent->>ScannerClient: CreateSBOM(ScanRequest)
        ScannerClient->>GRPC: CreateSBOM RPC
        GRPC->>Scanner: Forward request
        Scanner->>Scanner: Build image source & run Syft
        Scanner-->>GRPC: Response (sbom JSON + size)
        GRPC-->>ScannerClient: Response
        ScannerClient->>Agent: ScanResult (SyftDocument, size)
        Agent->>Storage: Persist SBOM
    else Not Ready
        Agent->>Agent: Run in-process Syft generation
        Agent->>Storage: Persist SBOM
    else Crash
        Agent->>Agent: handleScannerCrash() -> retry or mark oversized
    end
Loading
sequenceDiagram
    participant Main as main.go
    participant ClientCtor as NewSBOMScannerClient
    participant GRPC as gRPC (Unix Socket)
    participant Server as Scanner Server

    Main->>ClientCtor: NewSBOMScannerClient(socketPath)
    ClientCtor->>GRPC: Dial unix socket
    ClientCtor->>Server: Health() with backoff retries
    alt Health OK
        Server-->>ClientCtor: Ready=true
        ClientCtor-->>Main: client ready
    else Health Failed
        ClientCtor-->>Main: error
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • matthyx

Poem

🐰 I hopped to a socket, soft and bright,

gRPC hummed while Syft took a bite,
Sidecar and agent twirled in a scan,
Bytes and metadata shelved by hand,
Hooray — SBOMs safe for the night!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add SBOM scanner sidecar for memory-isolated SBOM generation' accurately and specifically describes the main change: introducing a separate gRPC sidecar for SBOM generation to prevent OOM issues, which aligns with the changeset's core objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/sbom-scanner-sidecar
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (8)
pkg/sbomscanner/v1/proto/scanner.proto (1)

1-4: Consider fixing the Buf lint directory mismatch.

The static analysis tool reports that files with package sbomscanner.v1 should be in a sbomscanner/v1 directory relative to root, but this file is in pkg/sbomscanner/v1/proto.

Options to resolve:

  1. Add a buf.yaml with lint.ignore for PACKAGE_DIRECTORY_MATCH
  2. Move proto to proto/sbomscanner/v1/scanner.proto and adjust go_package

This won't affect functionality but will cause Buf lint failures in CI if enforced.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbomscanner/v1/proto/scanner.proto` around lines 1 - 4, The Buf lint
complaint arises because the proto package name "sbomscanner.v1" in
scanner.proto does not match its directory layout; fix by either adding a
buf.yaml at repo root that ignores PACKAGE_DIRECTORY_MATCH (add lint.ignore: -
PACKAGE_DIRECTORY_MATCH) or move the proto into a directory matching the package
(create proto/sbomscanner/v1/scanner.proto) and update the option go_package
string ("github.com/kubescape/node-agent/pkg/sbomscanner/v1/proto") if
relocating; reference the package declaration "package sbomscanner.v1" and the
option "option go_package" when making the change so Buf lint and go codegen
remain consistent.
pkg/sbommanager/v1/syftutil/helpers.go (1)

9-22: Consider replacing fmt.Println with structured logging.

The diskUsage function uses fmt.Println for error output (lines 13, 20), which bypasses structured logging and may not be captured in container logs. Since this code is being extracted into a shared utility, consider using a logger or returning errors.

♻️ Suggested improvement
-func diskUsage(path string) int64 {
+func diskUsage(path string) (int64, error) {
 	var s int64
 	dir, err := os.Open(path)
 	if err != nil {
-		fmt.Println(err)
-		return s
+		return 0, fmt.Errorf("open directory %s: %w", path, err)
 	}
 	defer dir.Close()

 	files, err := dir.Readdir(-1)
 	if err != nil {
-		fmt.Println(err)
-		return s
+		return 0, fmt.Errorf("read directory %s: %w", path, err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbommanager/v1/syftutil/helpers.go` around lines 9 - 22, The diskUsage
function currently prints errors with fmt.Println; change it to return errors
and let callers handle logging: update func diskUsage(path string) int64 to func
diskUsage(path string) (int64, error) (or add a logger parameter if preferred),
replace the fmt.Println(err) calls with early returns returning the zero size
and the err, and update any callers (e.g., callers of diskUsage) to handle/log
the returned error using the project's structured logger; ensure to update
references to diskUsage accordingly.
pkg/sbommanager/v1/sbom_manager.go (1)

99-102: Silent failure on invalid SCANNER_MEMORY_LIMIT value.

If the environment variable contains non-numeric data, ParseInt returns 0 and the error is silently ignored. Consider logging a warning so operators know their configuration is invalid.

♻️ Optional: Log warning on parse failure
 	var scannerMemLimit int64
 	if memStr, ok := os.LookupEnv("SCANNER_MEMORY_LIMIT"); ok {
-		scannerMemLimit, _ = strconv.ParseInt(memStr, 10, 64)
+		var parseErr error
+		scannerMemLimit, parseErr = strconv.ParseInt(memStr, 10, 64)
+		if parseErr != nil {
+			logger.L().Warning("invalid SCANNER_MEMORY_LIMIT value, using 0",
+				helpers.String("value", memStr),
+				helpers.Error(parseErr))
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbommanager/v1/sbom_manager.go` around lines 99 - 102, The code silently
ignores strconv.ParseInt errors when reading SCANNER_MEMORY_LIMIT into
scannerMemLimit; modify the os.LookupEnv/ParseInt block to check the ParseInt
error and log a warning (including the invalid value and the parse error)
instead of discarding it, falling back to the existing zero/default behavior;
reference the scannerMemLimit variable, the SCANNER_MEMORY_LIMIT env var, and
strconv.ParseInt so you update that specific parsing branch to emit a warning
via the package logger (or log.Printf) when parsing fails.
pkg/sbomscanner/v1/server_test.go (1)

74-89: Duplicate test helper: consider extracting to shared test utilities.

makeImageStatusJSON is nearly identical to makeTestImageStatus in integration_test.go. Consider extracting to a shared testutil_test.go file to reduce duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbomscanner/v1/server_test.go` around lines 74 - 89, Duplicate test
helper functions makeImageStatusJSON and makeTestImageStatus should be
consolidated: create a new shared test file (e.g., testutil_test.go) and move
the common helper into it (export or keep unexported as needed), then replace
calls in pkg/sbomscanner/v1/server_test.go and integration_test.go to use that
single helper; ensure the helper signature and returned value remain the same
(function name, parameters, and json marshal behavior) and update imports/test
files so tests compile.
cmd/main.go (1)

385-392: Consider creating scanner client only when SBOM generation is enabled.

The scanner client is created unconditionally when SBOM_SCANNER_SOCKET is set, even if cfg.EnableSbomGeneration is false. While this doesn't cause issues (the client simply won't be used), it's slightly wasteful to establish a gRPC connection and perform health checks when SBOM generation is disabled.

♻️ Optional: Move client creation inside the SBOM-enabled block
-	// Create the SBOM scanner sidecar client (if configured)
-	var scannerClient sbomscanner.SBOMScannerClient
-	if socket, ok := os.LookupEnv("SBOM_SCANNER_SOCKET"); ok {
-		scannerClient, err = sbomscanner.NewSBOMScannerClient(socket)
-		if err != nil {
-			logger.L().Ctx(ctx).Warning("SBOM scanner sidecar not available, falling back to in-process scanning", helpers.Error(err))
-		}
-	}
-
 	// Create the SBOM manager
 	var sbomManager sbommanager.SbomManagerClient
 	if cfg.EnableSbomGeneration {
+		var scannerClient sbomscanner.SBOMScannerClient
+		if socket, ok := os.LookupEnv("SBOM_SCANNER_SOCKET"); ok {
+			scannerClient, err = sbomscanner.NewSBOMScannerClient(socket)
+			if err != nil {
+				logger.L().Ctx(ctx).Warning("SBOM scanner sidecar not available, falling back to in-process scanning", helpers.Error(err))
+			}
+		}
 		sbomManager, err = sbommanagerv1.CreateSbomManager(ctx, cfg, igK8sClient.RuntimeConfig.SocketPath, storageClient, k8sObjectCache, scannerClient)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/main.go` around lines 385 - 392, Only create the SBOM scanner sidecar
client when SBOM generation is enabled: guard the
sbomscanner.NewSBOMScannerClient call (and the SBOM_SCANNER_SOCKET env check)
with cfg.EnableSbomGeneration so you don't open the gRPC connection or run
health checks when disabled; keep scannerClient in the same scope but move the
socket lookup, NewSBOMScannerClient call, and the existing
logger.L().Ctx(ctx).Warning(...) error handling inside the
cfg.EnableSbomGeneration block to preserve behavior when enabled.
pkg/sbomscanner/v1/client.go (1)

43-59: Consider adding a maximum elapsed time to the startup health check backoff.

The exponential backoff uses default settings which allow retries for up to ~15 minutes. If the scanner sidecar fails to start, this could significantly delay node-agent startup before falling back to in-process scanning.

♻️ Suggested: Configure bounded backoff for faster failure detection
-	_, err = backoff.Retry(context.Background(), func() (struct{}, error) {
+	bo := backoff.NewExponentialBackOff()
+	bo.MaxElapsedTime = 30 * time.Second // Fail faster if scanner won't start
+	_, err = backoff.Retry(context.Background(), func() (struct{}, error) {
 		ctx, cancel := context.WithTimeout(context.Background(), healthCheckTimeout)
 		defer cancel()
 		resp, err := c.client.Health(ctx, &pb.HealthRequest{})
 		if err != nil {
 			return struct{}{}, fmt.Errorf("health check failed: %w", err)
 		}
 		if !resp.Ready {
 			return struct{}{}, fmt.Errorf("scanner not ready")
 		}
 		return struct{}{}, nil
-	}, backoff.WithBackOff(backoff.NewExponentialBackOff()))
+	}, backoff.WithBackOff(bo))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbomscanner/v1/client.go` around lines 43 - 59, The exponential backoff
used in the startup health check (backoff.Retry with
backoff.NewExponentialBackOff()) is unbounded and can take ~15 minutes; create
and configure an ExponentialBackOff instance with a bounded MaxElapsedTime
(e.g., a short configurable duration constant) and pass that instance into
backoff.WithBackOff instead of calling backoff.NewExponentialBackOff() inline;
update the retry call around backoff.Retry(...) and use the existing
healthCheckTimeout/timeout context as before so the health check fails fast when
the sidecar doesn't start.
pkg/sbomscanner/v1/integration_test.go (1)

39-62: Consider using t.Cleanup() for automatic test cleanup.

The test helper returns cleanup responsibilities to callers. Using t.Cleanup() would ensure resources are cleaned up even if a test panics or forgets to defer cleanup.

♻️ Optional: Use t.Cleanup for automatic resource management
 func startIntegrationServer(t *testing.T) (SBOMScannerClient, *grpc.Server, string) {
 	t.Helper()
 	dir := t.TempDir()
 	sock := filepath.Join(dir, "scanner.sock")

 	lis, err := net.Listen("unix", sock)
 	require.NoError(t, err)

 	srv := grpc.NewServer()
 	pb.RegisterSBOMScannerServer(srv, NewScannerServer())
 	go srv.Serve(lis)

 	conn, err := grpc.NewClient("unix://"+sock,
 		grpc.WithTransportCredentials(insecure.NewCredentials()),
 	)
 	require.NoError(t, err)

 	client := &sbomScannerClient{
 		conn:   conn,
 		client: pb.NewSBOMScannerClient(conn),
 	}

+	t.Cleanup(func() {
+		client.Close()
+		srv.Stop()
+		os.Remove(sock)
+	})
+
-	return client, srv, sock
+	return client, srv, sock // callers can still use srv for specific test scenarios
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbomscanner/v1/integration_test.go` around lines 39 - 62, The helper
startIntegrationServer currently returns resources for callers to clean up;
instead register automatic cleanup with t.Cleanup inside startIntegrationServer
so resources are always released: after creating lis, register t.Cleanup to
close lis (lis.Close()), after creating srv and starting it, register t.Cleanup
to stop the server (srv.Stop()), and after creating conn register t.Cleanup to
close the client connection (conn.Close()); update startIntegrationServer to
perform these t.Cleanup registrations so callers no longer need to remember to
defer cleanup.
pkg/sbommanager/v1/syftutil/document.go (1)

18-19: Silently ignoring JSON marshal errors may hide issues.

Lines 18-19 discard marshal errors for configuration and metadata. While these fields are likely always valid JSON, silently ignoring errors could make debugging difficult if malformed data ever appears.

Consider logging at debug level if marshaling fails, or at minimum document why errors are safely ignored.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbommanager/v1/syftutil/document.go` around lines 18 - 19, The code
currently discards errors from json.Marshal for configuration and metadata
(configuration, _ := json.Marshal(doc.Descriptor.Configuration); metadata, _ :=
json.Marshal(doc.Source.Metadata)); update this to capture the error variables,
check them, and log a debug-level message (including which field and the error)
instead of ignoring (e.g., configuration, err := json.Marshal(...); if err !=
nil { logger.Debugf("syft document: failed to marshal Descriptor.Configuration:
%v", err); configuration = []byte("{}") } ), or otherwise handle the error
consistently (return it or set a safe default) so malformed data is observable;
reference doc.Descriptor.Configuration and doc.Source.Metadata when implementing
the check and log.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sbommanager/v1/sbom_manager.go`:
- Around line 72-74: The scanRetries map is accessed concurrently (e.g., in
handleScannerCrash and processContainer) and may race when worker pool
concurrency increases; protect it by replacing scanRetries map[string]int with a
concurrency-safe structure or guarding accesses with a mutex: either switch to
sync.Map (update uses Load, Store, Delete) or add a sync.Mutex (e.g.,
scanRetriesMu) and wrap all reads/writes in handleScannerCrash,
processContainer, and any other funcs that touch scanRetries to hold the lock;
update all references to use the chosen synchronization so map accesses are
safe.
- Around line 477-497: scanRetries entries are never removed on successful scans
or other terminal non-crash failures, causing a memory leak; remove the entry
when a scan reaches any terminal state. Specifically, in the success path where
sbomScanTotal.WithLabelValues("success").Inc() /
sbomScanDuration.WithLabelValues("success").Observe(...) and syftDoc =
result.SyftDocument are set, add delete(s.scanRetries, sbomName) to clear the
counter; likewise, ensure you delete(s.scanRetries, sbomName) in other
terminal/error branches (any code that marks the SBOM as finished/failed) so
only the handleScannerCrash path relies on its existing deletion when retryCount
>= maxScanRetries.

In `@pkg/sbomscanner/v1/server.go`:
- Around line 67-70: The error handling in the RPC handler currently maps any
ctx.Err() to DeadlineExceeded; update the post-error branch in the handler (the
function containing the shown block, e.g., CreateSBOM/Scan handler) to inspect
ctx.Err() and return status.Error(codes.Canceled, ...) when ctx.Err() ==
context.Canceled and status.Error(codes.DeadlineExceeded, ...) when ctx.Err() ==
context.DeadlineExceeded; use context.Canceled and context.DeadlineExceeded
comparisons to distinguish the two cases and keep the existing error
messages/context in the returned status.Error calls.

---

Nitpick comments:
In `@cmd/main.go`:
- Around line 385-392: Only create the SBOM scanner sidecar client when SBOM
generation is enabled: guard the sbomscanner.NewSBOMScannerClient call (and the
SBOM_SCANNER_SOCKET env check) with cfg.EnableSbomGeneration so you don't open
the gRPC connection or run health checks when disabled; keep scannerClient in
the same scope but move the socket lookup, NewSBOMScannerClient call, and the
existing logger.L().Ctx(ctx).Warning(...) error handling inside the
cfg.EnableSbomGeneration block to preserve behavior when enabled.

In `@pkg/sbommanager/v1/sbom_manager.go`:
- Around line 99-102: The code silently ignores strconv.ParseInt errors when
reading SCANNER_MEMORY_LIMIT into scannerMemLimit; modify the
os.LookupEnv/ParseInt block to check the ParseInt error and log a warning
(including the invalid value and the parse error) instead of discarding it,
falling back to the existing zero/default behavior; reference the
scannerMemLimit variable, the SCANNER_MEMORY_LIMIT env var, and strconv.ParseInt
so you update that specific parsing branch to emit a warning via the package
logger (or log.Printf) when parsing fails.

In `@pkg/sbommanager/v1/syftutil/document.go`:
- Around line 18-19: The code currently discards errors from json.Marshal for
configuration and metadata (configuration, _ :=
json.Marshal(doc.Descriptor.Configuration); metadata, _ :=
json.Marshal(doc.Source.Metadata)); update this to capture the error variables,
check them, and log a debug-level message (including which field and the error)
instead of ignoring (e.g., configuration, err := json.Marshal(...); if err !=
nil { logger.Debugf("syft document: failed to marshal Descriptor.Configuration:
%v", err); configuration = []byte("{}") } ), or otherwise handle the error
consistently (return it or set a safe default) so malformed data is observable;
reference doc.Descriptor.Configuration and doc.Source.Metadata when implementing
the check and log.

In `@pkg/sbommanager/v1/syftutil/helpers.go`:
- Around line 9-22: The diskUsage function currently prints errors with
fmt.Println; change it to return errors and let callers handle logging: update
func diskUsage(path string) int64 to func diskUsage(path string) (int64, error)
(or add a logger parameter if preferred), replace the fmt.Println(err) calls
with early returns returning the zero size and the err, and update any callers
(e.g., callers of diskUsage) to handle/log the returned error using the
project's structured logger; ensure to update references to diskUsage
accordingly.

In `@pkg/sbomscanner/v1/client.go`:
- Around line 43-59: The exponential backoff used in the startup health check
(backoff.Retry with backoff.NewExponentialBackOff()) is unbounded and can take
~15 minutes; create and configure an ExponentialBackOff instance with a bounded
MaxElapsedTime (e.g., a short configurable duration constant) and pass that
instance into backoff.WithBackOff instead of calling
backoff.NewExponentialBackOff() inline; update the retry call around
backoff.Retry(...) and use the existing healthCheckTimeout/timeout context as
before so the health check fails fast when the sidecar doesn't start.

In `@pkg/sbomscanner/v1/integration_test.go`:
- Around line 39-62: The helper startIntegrationServer currently returns
resources for callers to clean up; instead register automatic cleanup with
t.Cleanup inside startIntegrationServer so resources are always released: after
creating lis, register t.Cleanup to close lis (lis.Close()), after creating srv
and starting it, register t.Cleanup to stop the server (srv.Stop()), and after
creating conn register t.Cleanup to close the client connection (conn.Close());
update startIntegrationServer to perform these t.Cleanup registrations so
callers no longer need to remember to defer cleanup.

In `@pkg/sbomscanner/v1/proto/scanner.proto`:
- Around line 1-4: The Buf lint complaint arises because the proto package name
"sbomscanner.v1" in scanner.proto does not match its directory layout; fix by
either adding a buf.yaml at repo root that ignores PACKAGE_DIRECTORY_MATCH (add
lint.ignore: - PACKAGE_DIRECTORY_MATCH) or move the proto into a directory
matching the package (create proto/sbomscanner/v1/scanner.proto) and update the
option go_package string
("github.com/kubescape/node-agent/pkg/sbomscanner/v1/proto") if relocating;
reference the package declaration "package sbomscanner.v1" and the option
"option go_package" when making the change so Buf lint and go codegen remain
consistent.

In `@pkg/sbomscanner/v1/server_test.go`:
- Around line 74-89: Duplicate test helper functions makeImageStatusJSON and
makeTestImageStatus should be consolidated: create a new shared test file (e.g.,
testutil_test.go) and move the common helper into it (export or keep unexported
as needed), then replace calls in pkg/sbomscanner/v1/server_test.go and
integration_test.go to use that single helper; ensure the helper signature and
returned value remain the same (function name, parameters, and json marshal
behavior) and update imports/test files so tests compile.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2487321b-d149-4b24-a51c-d9f527930e73

📥 Commits

Reviewing files that changed from the base of the PR and between f673d45 and 6b3bf16.

⛔ Files ignored due to path filters (2)
  • pkg/sbomscanner/v1/proto/scanner.pb.go is excluded by !**/*.pb.go
  • pkg/sbomscanner/v1/proto/scanner_grpc.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (17)
  • build/Dockerfile
  • cmd/main.go
  • cmd/sbom-scanner/main.go
  • pkg/sbommanager/v1/metrics.go
  • pkg/sbommanager/v1/sbom_manager.go
  • pkg/sbommanager/v1/syftutil/document.go
  • pkg/sbommanager/v1/syftutil/helpers.go
  • pkg/sbommanager/v1/syftutil/resolver.go
  • pkg/sbommanager/v1/syftutil/source.go
  • pkg/sbommanager/v1/syftutil/source_test.go
  • pkg/sbomscanner/v1/client.go
  • pkg/sbomscanner/v1/integration_test.go
  • pkg/sbomscanner/v1/proto/buf.gen.yaml
  • pkg/sbomscanner/v1/proto/scanner.proto
  • pkg/sbomscanner/v1/server.go
  • pkg/sbomscanner/v1/server_test.go
  • pkg/sbomscanner/v1/types.go

Signed-off-by: Ben <ben@armosec.io>
…or handling for canceled and timed-out scans

Signed-off-by: Ben <ben@armosec.io>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
pkg/sbommanager/v1/sbom_manager.go (2)

371-379: ⚠️ Potential issue | 🟡 Minor

Clear stale crash-retry state on non-crash error returns.

After prior crashes, Lines 371-379 return on non-crash errors without cleaning scanRetries[sbomName], so entries can linger indefinitely for images that never reach success/max-retries.

🧹 Proposed fix
 		sbomScanTotal.WithLabelValues("error").Inc()
 		sbomScanDuration.WithLabelValues("error").Observe(scanDuration)
+		delete(s.scanRetries, sbomName)
 		logger.L().Ctx(s.ctx).Error("SbomManager - sidecar scan failed",
 			helpers.Error(scanErr),
 			helpers.String("namespace", notif.Container.K8s.Namespace),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbommanager/v1/sbom_manager.go` around lines 371 - 379, The error-return
path in SbomManager leaves stale entries in the scanRetries map for sbomName
when a non-crash error occurs; before returning in the non-crash error block
(the branch that logs "SbomManager - sidecar scan failed"), remove the retry
state by deleting scanRetries[sbomName] (or resetting its counter) so retries do
not linger for that image, and ensure you perform the deletion under the same
synchronization used for scanRetries (the same lock/guard used elsewhere in the
SbomManager methods).

99-102: ⚠️ Potential issue | 🟡 Minor

Handle invalid SCANNER_MEMORY_LIMIT explicitly instead of silently coercing to 0.

Lines 99-102 ignore strconv.ParseInt errors. A malformed env var silently disables memory-limit-based behavior and makes retries/annotations harder to reason about.

🔧 Proposed fix
 	var scannerMemLimit int64
 	if memStr, ok := os.LookupEnv("SCANNER_MEMORY_LIMIT"); ok {
-		scannerMemLimit, _ = strconv.ParseInt(memStr, 10, 64)
+		parsed, parseErr := strconv.ParseInt(memStr, 10, 64)
+		if parseErr != nil || parsed <= 0 {
+			logger.L().Ctx(ctx).Warning("SbomManager - invalid SCANNER_MEMORY_LIMIT, ignoring",
+				helpers.String("value", memStr),
+				helpers.Error(parseErr))
+		} else {
+			scannerMemLimit = parsed
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbommanager/v1/sbom_manager.go` around lines 99 - 102, The code silently
ignores strconv.ParseInt errors when reading SCANNER_MEMORY_LIMIT into
scannerMemLimit; change the os.LookupEnv/strconv.ParseInt block to check the
parse error and handle it explicitly: call strconv.ParseInt into a temp value
and if err != nil then return (or propagate) a descriptive error (or at minimum
log it via the package logger and fail fast) instead of assigning zero,
otherwise assign scannerMemLimit = parsed value; reference the scannerMemLimit
variable, os.LookupEnv and strconv.ParseInt in your change so reviewers can find
the fix.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sbommanager/v1/sbom_manager.go`:
- Around line 491-498: The current branch finalizes a TooLarge terminal status
but ignores errors from s.storageClient.ReplaceSBOM and unconditionally deletes
retry state (s.scanRetries[sbomName]); change this so you capture the error
returned by ReplaceSBOM when updating wipSbom (setting
Annotations[helpersv1.StatusMetadataKey]=helpersv1.TooLarge,
ScannerMemoryLimitAnnotation, and clearing Spec to v1beta1.SBOMSyftSpec{}), and
only remove the retry entry from s.scanRetries if ReplaceSBOM succeeds; if
ReplaceSBOM returns an error, log/propagate the error and leave s.scanRetries
intact so retry state persists.

In `@pkg/sbomscanner/v1/server.go`:
- Around line 24-28: The CreateSBOM handler currently calls s.mu.Lock() before
creating a context with timeout, so mutex wait is unbounded; replace the plain
mutex with a context-aware semaphore and acquire it after creating the request
context: change scannerServer.mu from sync.Mutex to a semaphore channel (e.g.,
sem chan struct{} initialized with one token), in CreateSBOM call ctx, cancel :=
context.WithTimeout(...) before trying to acquire the gate, then use a select to
either receive the token from s.sem (acquire) or return on ctx.Done()
(timeout/cancel), and defer returning the token (release); apply the same
pattern for other methods that used s.mu.Lock()/Unlock() so lock acquisition is
protected by the request context and respects timeouts.

---

Duplicate comments:
In `@pkg/sbommanager/v1/sbom_manager.go`:
- Around line 371-379: The error-return path in SbomManager leaves stale entries
in the scanRetries map for sbomName when a non-crash error occurs; before
returning in the non-crash error block (the branch that logs "SbomManager -
sidecar scan failed"), remove the retry state by deleting scanRetries[sbomName]
(or resetting its counter) so retries do not linger for that image, and ensure
you perform the deletion under the same synchronization used for scanRetries
(the same lock/guard used elsewhere in the SbomManager methods).
- Around line 99-102: The code silently ignores strconv.ParseInt errors when
reading SCANNER_MEMORY_LIMIT into scannerMemLimit; change the
os.LookupEnv/strconv.ParseInt block to check the parse error and handle it
explicitly: call strconv.ParseInt into a temp value and if err != nil then
return (or propagate) a descriptive error (or at minimum log it via the package
logger and fail fast) instead of assigning zero, otherwise assign
scannerMemLimit = parsed value; reference the scannerMemLimit variable,
os.LookupEnv and strconv.ParseInt in your change so reviewers can find the fix.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 257b1544-5f22-402e-aa0e-23195a640dc5

📥 Commits

Reviewing files that changed from the base of the PR and between 6b3bf16 and c3f3a41.

📒 Files selected for processing (3)
  • cmd/sbom-scanner/main.go
  • pkg/sbommanager/v1/sbom_manager.go
  • pkg/sbomscanner/v1/server.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmd/sbom-scanner/main.go

Comment on lines +491 to 498
if retryCount >= maxScanRetries {
delete(wipSbom.Annotations, NodeNameMetadataKey)
wipSbom.Annotations[helpersv1.StatusMetadataKey] = helpersv1.TooLarge
wipSbom.Annotations[ScannerMemoryLimitAnnotation] = fmt.Sprintf("%d", s.scannerMemLimit)
wipSbom.Spec = v1beta1.SBOMSyftSpec{}
_, _ = s.storageClient.ReplaceSBOM(wipSbom)
delete(s.scanRetries, sbomName)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not drop retry state when persisting TooLarge fails.

Lines 496-497 ignore ReplaceSBOM errors and still delete retry state. If storage write fails, the SBOM remains un-finalized but retry tracking is lost, causing repeated crash loops without durable terminal status.

✅ Safer terminalization flow
 	if retryCount >= maxScanRetries {
 		delete(wipSbom.Annotations, NodeNameMetadataKey)
 		wipSbom.Annotations[helpersv1.StatusMetadataKey] = helpersv1.TooLarge
 		wipSbom.Annotations[ScannerMemoryLimitAnnotation] = fmt.Sprintf("%d", s.scannerMemLimit)
 		wipSbom.Spec = v1beta1.SBOMSyftSpec{}
-		_, _ = s.storageClient.ReplaceSBOM(wipSbom)
-		delete(s.scanRetries, sbomName)
+		if _, err := s.storageClient.ReplaceSBOM(wipSbom); err != nil {
+			logger.L().Error("SbomManager - failed to persist TooLarge SBOM after scanner crashes",
+				helpers.Error(err),
+				helpers.String("namespace", notif.Container.K8s.Namespace),
+				helpers.String("pod", notif.Container.K8s.PodName),
+				helpers.String("container", notif.Container.K8s.ContainerName),
+				helpers.String("sbomName", sbomName))
+			return
+		}
+		delete(s.scanRetries, sbomName)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbommanager/v1/sbom_manager.go` around lines 491 - 498, The current
branch finalizes a TooLarge terminal status but ignores errors from
s.storageClient.ReplaceSBOM and unconditionally deletes retry state
(s.scanRetries[sbomName]); change this so you capture the error returned by
ReplaceSBOM when updating wipSbom (setting
Annotations[helpersv1.StatusMetadataKey]=helpersv1.TooLarge,
ScannerMemoryLimitAnnotation, and clearing Spec to v1beta1.SBOMSyftSpec{}), and
only remove the retry entry from s.scanRetries if ReplaceSBOM succeeds; if
ReplaceSBOM returns an error, log/propagate the error and leave s.scanRetries
intact so retry state persists.

Comment on lines +24 to +28
type scannerServer struct {
pb.UnimplementedSBOMScannerServer
mu sync.Mutex
version string
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n -C3 'mu\.Lock\(|WithTimeout\(|defer s\.mu\.Unlock' pkg/sbomscanner/v1/server.go

Repository: kubescape/node-agent

Length of output: 424


🏁 Script executed:

cat -n pkg/sbomscanner/v1/server.go

Repository: kubescape/node-agent

Length of output: 4052


Request timeout does not protect lock acquisition—mutex wait is unbound.

CreateSBOM calls s.mu.Lock() at line 37 before establishing the request timeout at lines 40-44. Under contention, requests block indefinitely on the mutex before timeout logic applies. The timeout only protects the scan execution (line 66), not the serialization gate.

Proposed refactor (context-aware sequential gate)
 type scannerServer struct {
 	pb.UnimplementedSBOMScannerServer
-	mu      sync.Mutex
+	mu      sync.Mutex
+	scanSem chan struct{}
 	version string
 }

 func NewScannerServer() pb.SBOMScannerServer {
 	return &scannerServer{
+		scanSem: make(chan struct{}, 1),
 		version: packageVersion("github.com/anchore/syft"),
 	}
 }

 func (s *scannerServer) CreateSBOM(ctx context.Context, req *pb.CreateSBOMRequest) (*pb.CreateSBOMResponse, error) {
-	s.mu.Lock()
-	defer s.mu.Unlock()
-
 	if req.TimeoutSeconds > 0 {
 		var cancel context.CancelFunc
 		ctx, cancel = context.WithTimeout(ctx, time.Duration(req.TimeoutSeconds)*time.Second)
 		defer cancel()
 	}
+
+	select {
+	case s.scanSem <- struct{}{}:
+		defer func() { <-s.scanSem }()
+	case <-ctx.Done():
+		if ctx.Err() == context.Canceled {
+			return nil, status.Error(codes.Canceled, "scan canceled")
+		}
+		if ctx.Err() == context.DeadlineExceeded {
+			return nil, status.Error(codes.DeadlineExceeded, "scan timed out")
+		}
+		return nil, status.Errorf(codes.Internal, "failed before scan start: %v", ctx.Err())
+	}

Also applies to: 30-33, 36-44

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbomscanner/v1/server.go` around lines 24 - 28, The CreateSBOM handler
currently calls s.mu.Lock() before creating a context with timeout, so mutex
wait is unbounded; replace the plain mutex with a context-aware semaphore and
acquire it after creating the request context: change scannerServer.mu from
sync.Mutex to a semaphore channel (e.g., sem chan struct{} initialized with one
token), in CreateSBOM call ctx, cancel := context.WithTimeout(...) before trying
to acquire the gate, then use a select to either receive the token from s.sem
(acquire) or return on ctx.Done() (timeout/cancel), and defer returning the
token (release); apply the same pattern for other methods that used
s.mu.Lock()/Unlock() so lock acquisition is protected by the request context and
respects timeouts.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/sbommanager/v1/sbom_manager.go (1)

491-497: ⚠️ Potential issue | 🟠 Major

Preserve retry state when persisting TooLarge fails.

Line 496 ignores ReplaceSBOM errors and Line 497 always deletes retry state; that can drop recovery context without persisting terminal status.

🔧 Proposed fix
 	if retryCount >= maxScanRetries {
 		delete(wipSbom.Annotations, NodeNameMetadataKey)
 		wipSbom.Annotations[helpersv1.StatusMetadataKey] = helpersv1.TooLarge
 		wipSbom.Annotations[ScannerMemoryLimitAnnotation] = fmt.Sprintf("%d", s.scannerMemLimit)
 		wipSbom.Spec = v1beta1.SBOMSyftSpec{}
-		_, _ = s.storageClient.ReplaceSBOM(wipSbom)
+		if _, err := s.storageClient.ReplaceSBOM(wipSbom); err != nil {
+			logger.L().Error("SbomManager - failed to persist TooLarge SBOM after scanner crashes",
+				helpers.Error(err),
+				helpers.String("namespace", notif.Container.K8s.Namespace),
+				helpers.String("pod", notif.Container.K8s.PodName),
+				helpers.String("container", notif.Container.K8s.ContainerName),
+				helpers.String("sbomName", sbomName))
+			return
+		}
 		delete(s.scanRetries, sbomName)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbommanager/v1/sbom_manager.go` around lines 491 - 497, The current block
ignores the error returned by s.storageClient.ReplaceSBOM and unconditionally
removes retry state via delete(s.scanRetries, sbomName), which can lose recovery
context if persistence fails; update the code in the retry-finalization branch
(references: retryCount, maxScanRetries, wipSbom, s.storageClient.ReplaceSBOM,
delete(s.scanRetries, sbomName)) to check and handle the ReplaceSBOM error —
attempt the ReplaceSBOM call, if it returns an error then log/return the error
and do NOT delete s.scanRetries[sbomName] (preserve retry state); only remove
the retry entry after ReplaceSBOM succeeds. Ensure any logging includes context
(sbomName and the error) so operators can act.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sbommanager/v1/sbom_manager.go`:
- Around line 351-360: The CreateSBOM RPC is called with s.ctx (a background
context) which has no deadline; wrap s.ctx with a client-side timeout using
context.WithTimeout before calling s.scannerClient.CreateSBOM (and defer cancel)
and pass that derived ctx into CreateSBOM instead of s.ctx so transport-level
stalls are bounded; keep the existing Timeout field in sbomscanner.ScanRequest
but enforce an overall network/transport deadline in the client.

---

Duplicate comments:
In `@pkg/sbommanager/v1/sbom_manager.go`:
- Around line 491-497: The current block ignores the error returned by
s.storageClient.ReplaceSBOM and unconditionally removes retry state via
delete(s.scanRetries, sbomName), which can lose recovery context if persistence
fails; update the code in the retry-finalization branch (references: retryCount,
maxScanRetries, wipSbom, s.storageClient.ReplaceSBOM, delete(s.scanRetries,
sbomName)) to check and handle the ReplaceSBOM error — attempt the ReplaceSBOM
call, if it returns an error then log/return the error and do NOT delete
s.scanRetries[sbomName] (preserve retry state); only remove the retry entry
after ReplaceSBOM succeeds. Ensure any logging includes context (sbomName and
the error) so operators can act.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 693f55df-2d30-4ab3-b1d7-01ed92347f75

📥 Commits

Reviewing files that changed from the base of the PR and between c3f3a41 and 4fbfe64.

📒 Files selected for processing (1)
  • pkg/sbommanager/v1/sbom_manager.go

Comment on lines +351 to +360
result, scanErr := s.scannerClient.CreateSBOM(s.ctx, sbomscanner.ScanRequest{
ImageID: imageID,
ImageTag: sharedData.ImageTag,
LayerPaths: mounts,
ImageStatus: imageStatusBytes,
MaxImageSize: s.cfg.MaxImageSize,
MaxSBOMSize: int32(s.cfg.MaxSBOMSize),
EnableEmbeddedSBOMs: s.cfg.EnableEmbeddedSboms,
Timeout: 15 * time.Minute,
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify call site context handling for scanner RPC in sbom manager.
rg -nP --type=go -C3 'CreateSBOM\(\s*s\.ctx' pkg/sbommanager/v1/sbom_manager.go
rg -nP --type=go -C3 'CreateSBOM\(' pkg/sbommanager/v1/sbom_manager.go
rg -nP --type=go -C2 'WithTimeout\(' pkg/sbommanager/v1/sbom_manager.go

Repository: kubescape/node-agent

Length of output: 1277


🏁 Script executed:

#!/bin/bash
# Check s.ctx initialization and its properties
rg -nP --type=go -B5 -A3 'type.*Manager.*struct' pkg/sbommanager/v1/sbom_manager.go | head -50

Repository: kubescape/node-agent

Length of output: 389


🏁 Script executed:

#!/bin/bash
# Look for ctx field in Manager struct
ast-grep --pattern 'type Manager struct {
  $$$
  ctx $_
  $$$
}'

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Search for context.WithTimeout usage in the codebase to see if it's a common pattern
rg -n 'context\.WithTimeout' --type=go | head -20

Repository: kubescape/node-agent

Length of output: 2094


🏁 Script executed:

#!/bin/bash
# Check if there are other scanner-related calls with timeout handling
rg -n 'scannerClient\.' --type=go -C2 pkg/sbommanager/v1/sbom_manager.go

Repository: kubescape/node-agent

Length of output: 451


🏁 Script executed:

#!/bin/bash
# Find how s.ctx is initialized in NewManager or constructor
rg -n 'func New.*Manager' --type=go pkg/sbommanager/v1/sbom_manager.go -A20

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Check if s.ctx has any deadline already set
rg -n 'ctx.*context\.Background\|ctx.*context\.TODO' --type=go pkg/sbommanager/v1/sbom_manager.go

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Look at the entire manager initialization to understand ctx source
rg -n 'ctx' --type=go pkg/sbommanager/v1/sbom_manager.go | head -30

Repository: kubescape/node-agent

Length of output: 1488


🏁 Script executed:

#!/bin/bash
# Find who calls CreateSbomManager to see what context is passed
rg -n 'CreateSbomManager' --type=go -B3 -A3

Repository: kubescape/node-agent

Length of output: 1757


🏁 Script executed:

#!/bin/bash
# Check the ProcessManager to understand context lifecycle
rg -n 'ProcessManager' --type=go pkg/sbommanager/v1/ | head -10

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Look for the actual ProcessManager implementation and its context handling
fd -t f '\.go$' pkg/sbommanager/ --exec grep -l 'ProcessManager' {} \;

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Check the main function and context initialization
rg -n 'func main' --type=go cmd/main.go -A50 | head -100

Repository: kubescape/node-agent

Length of output: 1917


🏁 Script executed:

#!/bin/bash
# Look for context creation in main
rg -n 'context\.Background\|context\.TODO' --type=go cmd/main.go

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Check if there's a root context with deadline in the main flow
rg -n 'WithTimeout\|WithCancel\|WithDeadline' --type=go cmd/main.go

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Verify the exact flow - check where ctx is created and if it ever gets a deadline
grep -n "context.Background\|context.TODO\|WithTimeout\|WithCancel" cmd/main.go | head -20

Repository: kubescape/node-agent

Length of output: 95


🏁 Script executed:

#!/bin/bash
# Check the ProcessManager method that calls the SBOM processing
rg -n 'ProcessManager' --type=go pkg/sbommanager/v1/sbom_manager.go -B5 -A15

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Search for ProcessManager method signature
rg -n 'func.*ProcessManager' --type=go pkg/sbommanager/v1/sbom_manager.go -A20

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Check how the sbom manager is used in the process flow
rg -n 'sbomManager\.' --type=go cmd/main.go | head -20

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

#!/bin/bash
# List all methods of SbomManager
ast-grep --pattern 'func (s *SbomManager) $_ ($_) {
  $$$
}'

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Look for the method that handles containers/SBOM processing
rg -n 'func.*SbomManager' --type=go pkg/sbommanager/v1/sbom_manager.go | head -20

Repository: kubescape/node-agent

Length of output: 958


🏁 Script executed:

#!/bin/bash
# Search for method signatures in the file
rg -n 'func \(s \*SbomManager\)' --type=go pkg/sbommanager/v1/sbom_manager.go

Repository: kubescape/node-agent

Length of output: 727


🏁 Script executed:

#!/bin/bash
# Look at the processContainer method signature and implementation
sed -n '209,380p' pkg/sbommanager/v1/sbom_manager.go

Repository: kubescape/node-agent

Length of output: 8159


🏁 Script executed:

#!/bin/bash
# Verify the RPC client definition to understand how timeout is handled
rg -n 'type SBOMScannerClient\|CreateSBOM' --type=go pkg/sbomscanner/v1/ | head -20

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Check ContainerCallback to understand the flow calling processContainer
sed -n '162,208p' pkg/sbommanager/v1/sbom_manager.go

Repository: kubescape/node-agent

Length of output: 1949


🏁 Script executed:

#!/bin/bash
# Check the worker pool configuration
rg -n 'pool\|Pool\|submit\|Submit' --type=go pkg/sbommanager/v1/sbom_manager.go | head -20

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Search for pool initialization to see if it's single-worker
rg -n 'workerpool\|NewWorkerPool\|NewPool' --type=go pkg/sbommanager/v1/ -B2 -A2

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Check the pool implementation
fd -t f '\.go$' -path '*/workerpool*' --exec grep -l "pool\|Pool" {} \;

Repository: kubescape/node-agent

Length of output: 232


🏁 Script executed:

#!/bin/bash
# Look at the SbomManager struct definition more carefully
sed -n '60,120p' pkg/sbommanager/v1/sbom_manager.go

Repository: kubescape/node-agent

Length of output: 2324


🏁 Script executed:

#!/bin/bash
# Search for pool in the entire sbom manager file
rg -n 'pool' --type=go pkg/sbommanager/v1/sbom_manager.go

Repository: kubescape/node-agent

Length of output: 327


Add a client-side deadline around the scanner RPC call.

Line 351 passes s.ctx directly without a timeout wrapper. The context is context.Background() with no deadline. If the transport or network layer stalls, the RPC call will hang indefinitely, blocking the single-worker pool and preventing any SBOM processing for subsequent containers. The server-side Timeout in the payload is insufficient because it cannot protect against transport-level stalls before reaching the server.

🔧 Proposed fix
-		result, scanErr := s.scannerClient.CreateSBOM(s.ctx, sbomscanner.ScanRequest{
+		scanCtx, cancel := context.WithTimeout(s.ctx, 16*time.Minute)
+		defer cancel()
+		result, scanErr := s.scannerClient.CreateSBOM(scanCtx, sbomscanner.ScanRequest{
 			ImageID:             imageID,
 			ImageTag:            sharedData.ImageTag,
 			LayerPaths:          mounts,
 			ImageStatus:         imageStatusBytes,
 			MaxImageSize:        s.cfg.MaxImageSize,
 			MaxSBOMSize:         int32(s.cfg.MaxSBOMSize),
 			EnableEmbeddedSBOMs: s.cfg.EnableEmbeddedSboms,
 			Timeout:             15 * time.Minute,
 		})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbommanager/v1/sbom_manager.go` around lines 351 - 360, The CreateSBOM
RPC is called with s.ctx (a background context) which has no deadline; wrap
s.ctx with a client-side timeout using context.WithTimeout before calling
s.scannerClient.CreateSBOM (and defer cancel) and pass that derived ctx into
CreateSBOM instead of s.ctx so transport-level stalls are bounded; keep the
existing Timeout field in sbomscanner.ScanRequest but enforce an overall
network/transport deadline in the client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants