Skip to content

merge open wildcard from k8sstorm fork#293

Open
entlein wants to merge 3 commits intokubescape:mainfrom
entlein:main
Open

merge open wildcard from k8sstorm fork#293
entlein wants to merge 3 commits intokubescape:mainfrom
entlein:main

Conversation

@entlein
Copy link

@entlein entlein commented Mar 3, 2026

need to check where those changes in pkg/apis/softwarecomposition... came from. Must retest

Summary by CodeRabbit

  • Bug Fixes

    • User-managed resources are now protected from cleanup operations, preventing accidental deletion.
  • Improvements

    • Enhanced wildcard port handling in endpoint analysis for more accurate dynamic routing detection.
    • Improved dynamic path collapsing with configurable thresholds for better resource optimization.
    • Simplified open path analysis logic for increased reliability.
  • Tests

    • Comprehensive test coverage added for dynamic path detection and endpoint collapsing behavior.

Signed-off-by: entlein <einentlein@gmail.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

The changes refactor the dynamic path detection system from a fixed-threshold approach to a configuration-driven Trie-based architecture. Two exported constants are removed and relocated to the dynamicpathdetector package. The path analyzer gains support for per-prefix configuration, wildcard port handling for endpoints, and simplified open analysis. Tests comprehensively validate the new behavior across all components.

Changes

Cohort / File(s) Summary
Git Configuration
.gitattributes
Adds merge strategy configuration for the .github directory pattern to use ours strategy.
Type Definitions and Constants
pkg/registry/file/dynamicpathdetector/types.go
Introduces WildcardIdentifier constant, CollapseConfig struct, and global thresholds (OpenDynamicThreshold, EndpointDynamicThreshold). Refactors PathAnalyzer fields to use Trie-based structure (root, identRoots, configs, collapseAdjacent). Adds TrieNode type for hierarchical path representation.
Core Analyzer Architecture
pkg/registry/file/dynamicpathdetector/analyzer.go
Replaces path analyzer implementation with Trie-based architecture. Introduces NewPathAnalyzerWithConfigs for config-driven initialization. Implements config propagation via applyConfigsToNode, path insertion via AddPath, and per-prefix configuration resolution via FindConfigForPath. Integrates collapse logic for dynamic/wildcard node creation and adjacent dynamic identifier collapsing.
Endpoint Analysis Enhancement
pkg/registry/file/dynamicpathdetector/analyze_endpoints.go
Adds wildcard port handling with isWildcardPort and rewritePort functions. Implements two-pass endpoint analysis: first pass detects and rewrites wildcard ports, second pass processes rewritten endpoints. Updates MergeDuplicateEndpoints to handle port 0 variants. Introduces splitEndpointPortAndPath for port/path parsing.
Open Analysis Simplification
pkg/registry/file/dynamicpathdetector/analyze_opens.go
Removes nil-check for sbomSet and conditional processing based on SBOM membership. Simplifies loop to always analyze each path with consistent merging behavior. Removes errors package import.
ApplicationProfile Configuration
pkg/registry/file/applicationprofile_processor.go
Removes exported constants OpenDynamicThreshold and EndpointDynamicThreshold. Updates path analyzer calls to use NewPathAnalyzerWithConfigs with relocated constants from dynamicpathdetector package.
ApplicationProfile Tests
pkg/registry/file/applicationprofile_processor_test.go
Adds comprehensive test suite with helpers generateSOOpens and openThreshold. Introduces six new tests validating collapse behavior under various conditions: threshold exceeding, SBOM set inclusion, mixed path prefixes, nil SBOM handling, and end-to-end PreSave flow.
ContainerProfile Configuration
pkg/registry/file/containerprofile_processor.go
Updates path analyzer initialization for opens and endpoints to use NewPathAnalyzerWithConfigs with respective thresholds and collapse configurations.
Namespace Cleanup
pkg/registry/file/cleanup.go
Adds guard to skip deletion of resources labeled with ManagedByMetadataKey == ManagedByUserValue, preventing cleanup of user-managed profiles/resources.
Dynamic Path Detector Tests
pkg/registry/file/dynamicpathdetector/tests/analyze_endpoints_test.go, analyze_opens_test.go, benchmark_test.go, coverage_test.go
Update all test instantiations from NewPathAnalyzer to NewPathAnalyzerWithConfigs. Add new wildcard port handling tests in endpoints. Expand coverage for per-prefix config overrides, threshold-driven scenarios, and collapse behavior validation. Disable BenchmarkCompareDynamic reference. Introduce test helpers for dynamic configuration and path filtering.

Sequence Diagram

sequenceDiagram
    participant Client as Path Analyzer<br/>Client
    participant PA as PathAnalyzer
    participant Trie as Trie Structure
    participant Config as Config System
    participant Collapse as Collapse Logic

    Client->>PA: NewPathAnalyzerWithConfigs(threshold, configs)
    PA->>Config: applyConfigsToNode(configs)
    Config->>Trie: Populate config trie
    
    Client->>PA: AddPath(path)
    PA->>PA: splitPath(path)
    PA->>Trie: addPathToRoot(tokens)
    Trie->>Trie: Traverse/Create nodes
    Trie->>Trie: Update counts
    
    PA->>Config: FindConfigForPath(path)
    Config->>Trie: Resolve via config trie
    Trie-->>Config: Return CollapseConfig
    
    Trie->>Collapse: Check threshold
    alt Count exceeds threshold
        Collapse->>Trie: createDynamicNode()
        Trie->>Trie: Replace children
    else CollapseAdjacent enabled
        Collapse->>Trie: CollapseAdjacentDynamicIdentifiers()
        Trie->>Trie: Merge dynamic segments
    end
    
    Client->>PA: AnalyzePath(path, identifier)
    PA->>Trie: Read tree state
    PA->>Collapse: Apply collapse rules
    Collapse-->>PA: Return collapsed path
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 Hops through the trie with configs in hand,
Each prefix now knows its collapse command.
Endpoints gain wildcard ports so free,
While opens collapse when thresholds agree,
A forest of paths becomes structured and grand! 🌲✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.63% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'merge open wildcard from k8sstorm fork' accurately describes the main objective of the PR: merging open wildcard functionality from a k8sstorm fork into the repository.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

entlein added 2 commits March 3, 2026 18:53
Signed-off-by: entlein <einentlein@gmail.com>
Signed-off-by: entlein <einentlein@gmail.com>
@entlein entlein marked this pull request as ready for review March 3, 2026 18:02
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/registry/file/dynamicpathdetector/analyze_opens.go (1)

12-12: ⚠️ Potential issue | 🟡 Minor

Remove unused sbomSet parameter.

The sbomSet parameter is declared in the function signature but is never referenced within the function body. Removing this parameter will require updating the function signature and all call sites in applicationprofile_processor.go, containerprofile_processor.go, and test files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/registry/file/dynamicpathdetector/analyze_opens.go` at line 12, The
AnalyzeOpens function signature includes an unused parameter sbomSet; remove
sbomSet from AnalyzeOpens(...) and update all call sites to pass only (opens,
analyzer) — specifically update calls in applicationprofile_processor.go,
containerprofile_processor.go and any affected tests to match the new signature;
run tests and fix any failing call sites or test helpers that still reference
sbomSet for AnalyzeOpens.
🧹 Nitpick comments (13)
.gitattributes (1)

1-1: Consider narrowing the scope to workflows only.

The pattern .github/** applies the merge=ours strategy to all GitHub configuration files (workflows, templates, dependabot, CODEOWNERS, etc.), not just workflows. Based on the commit message "dont merge workflows", a more targeted pattern would better match the intent:

-.github/** merge=ours
+# Prevent workflow files from forks from overwriting our workflows during merges
+.github/workflows/** merge=ours

Additionally, consider adding an inline comment explaining why this strategy is in place for future maintainers.

Note: This directive only affects future merge operations, not the current PR.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitattributes at line 1, The .gitattributes entry currently applies
merge=ours to .github/** which is too broad; replace the pattern with a narrower
one like `.github/workflows/** merge=ours` to target workflows only and add a
short inline comment above the entry explaining "keep ours for auto-updated
workflow files to avoid CI churn" (or similar) so future maintainers understand
the rationale; update the existing `.github/** merge=ours` line to the new
pattern and include the comment on the next line.
pkg/registry/file/dynamicpathdetector/tests/analyze_opens_test.go (2)

11-20: Duplicate configThreshold helper.

This function is identical to the one in coverage_test.go. Consider extracting to a shared testutils.go file in the tests package.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/registry/file/dynamicpathdetector/tests/analyze_opens_test.go` around
lines 11 - 20, The duplicate helper function configThreshold found in
analyze_opens_test.go is identical to the one in coverage_test.go; remove the
duplicate and extract the shared helper into a testutils.go within the tests
package, then update tests (e.g., TestAnalyzeOpensWithThreshold and the tests in
coverage_test.go) to import/use the shared configThreshold from testutils.go;
ensure the helper is package-visible for the tests and that
NewPathAnalyzerWithConfigs and OpenDynamicThreshold usages remain unchanged.

603-611: Remove unused helper function assertPathIsOneOf.

This function is defined at lines 603-611 but has no callers anywhere in the codebase. The functionality is covered by assertContainsOneOfPaths, which is actively used at lines 224, 227, and 230.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/registry/file/dynamicpathdetector/tests/analyze_opens_test.go` around
lines 603 - 611, Remove the unused helper function assertPathIsOneOf: locate the
function definition named assertPathIsOneOf and delete it entirely, since there
are no callers and its behavior is covered by the existing
assertContainsOneOfPaths helper; ensure no other tests reference
assertPathIsOneOf and run tests to confirm nothing breaks.
pkg/registry/file/dynamicpathdetector/types.go (2)

56-66: Type definition follows its constructor.

The TrieNode struct is defined after NewTrieNode() constructor. Consider reordering for better readability—placing the type definition before its constructor is conventional in Go.

Suggested reorder
+type TrieNode struct {
+	Children map[string]*TrieNode
+	Config   *CollapseConfig
+	Count    int
+}
+
 func NewTrieNode() *TrieNode {
 	return &TrieNode{
 		Children: make(map[string]*TrieNode),
 	}
 }
-
-type TrieNode struct {
-	Children map[string]*TrieNode
-	Config   *CollapseConfig
-	Count    int
-}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/registry/file/dynamicpathdetector/types.go` around lines 56 - 66, Move
the TrieNode type definition above its constructor NewTrieNode so the type
declaration precedes the function that constructs it; update the file so
TrieNode (with fields Children, Config, Count) is declared first and then
NewTrieNode returns &TrieNode{Children: make(map[string]*TrieNode)} to improve
readability and follow Go conventions.

25-25: Remove or formalize the inline comment.

The comment "//this is mostly for our webapp standard test" appears to be a development note. Consider either removing it or providing a more formal explanation of why /etc/apache2 has a lower threshold than /etc.

Suggested improvement
-	{Prefix: "/etc/apache2", Threshold: 5}, //this is mostly for our webapp standard test
+	{Prefix: "/etc/apache2", Threshold: 5}, // Lower threshold for Apache config directories with frequent dynamic module loading
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/registry/file/dynamicpathdetector/types.go` at line 25, The inline
development note after the literal {Prefix: "/etc/apache2", Threshold: 5} should
be removed or converted into a formal comment explaining why /etc/apache2 has a
lower Threshold than /etc; update the comment adjacent to the struct literal
using the Prefix and Threshold fields (or remove it entirely) so it becomes a
clear, production-ready rationale (e.g., "lower threshold for Apache config dir
due to expected smaller file set or test coverage reasons") rather than an
informal dev note.
pkg/registry/file/dynamicpathdetector/tests/benchmark_test.go (1)

75-82: Remove or restore commented-out benchmark.

BenchmarkCompareDynamic is fully commented out. If it's no longer needed, remove it entirely. If it's temporarily disabled, add a TODO comment explaining why and when it should be restored.

Suggested: remove dead code
-// func BenchmarkCompareDynamic(b *testing.B) {
-// 	dynamicPath := "/api/\u22ef/\u22ef"
-// 	regularPath := "/api/users/123"
-// 	for i := 0; i < b.N; i++ {
-// 		_ = dynamicpathdetector.CompareDynamic(dynamicPath, regularPath)
-// 	}
-// 	b.ReportAllocs()
-// }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/registry/file/dynamicpathdetector/tests/benchmark_test.go` around lines
75 - 82, The commented-out BenchmarkCompareDynamic block should be either
removed or restored with an explanatory TODO; either delete the entire commented
benchmark (the three-line block wrapping BenchmarkCompareDynamic) to eliminate
dead code, or uncomment it and add a TODO above the function explaining why it's
disabled and when to re-enable (e.g., TODO: enable benchmark once X is fixed),
referencing BenchmarkCompareDynamic and the call to
dynamicpathdetector.CompareDynamic so reviewers can locate it easily.
pkg/registry/file/dynamicpathdetector/tests/coverage_test.go (1)

11-20: Helper function duplicated across test files.

configThreshold is defined here and will likely be needed in other test files (e.g., analyze_opens_test.go also defines it). Consider extracting to a shared test helper file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/registry/file/dynamicpathdetector/tests/coverage_test.go` around lines 11
- 20, The helper function configThreshold is duplicated across tests; remove the
copy from this file and move it into a shared test helper file (same package as
the tests) so other tests can reuse it. Create a new test helper that defines
configThreshold(prefix string) int using
dynamicpathdetector.DefaultCollapseConfigs and
dynamicpathdetector.DefaultCollapseConfig.Threshold, then update tests that
currently define their own configThreshold to import/use the shared helper
instead.
pkg/registry/file/dynamicpathdetector/analyze_endpoints.go (2)

182-186: Replace fmt.Println with structured logging.

Using fmt.Println for error handling bypasses any logging infrastructure and makes errors hard to trace in production. Consider using a proper logger or returning the error.

Suggested fix
 	rawJSON, err := json.Marshal(existingHeaders)
 	if err != nil {
-		fmt.Println("Error marshaling JSON:", err)
 		return
 	}

Silently returning on marshal error is acceptable since headers are auxiliary data, but the fmt.Println should be removed regardless.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/registry/file/dynamicpathdetector/analyze_endpoints.go` around lines 182
- 186, The snippet marshals existingHeaders into rawJSON and currently calls
fmt.Println on error then returns; remove the fmt.Println and either use the
package's logger or propagate the error instead of printing to stdout. Locate
the json.Marshal(existingHeaders) call in analyze_endpoints.go (rawJSON,
existingHeaders) and replace the fmt.Println("Error marshaling JSON:", err) with
a structured log call (e.g., logger.Errorf or processLogger.Error) or simply
return the error up the call chain if a logger isn't available; ensure no plain
fmt prints remain.

54-56: Simplify the confusing conditional.

The condition processedEndpoint == nil && err == nil || err != nil is hard to parse due to operator precedence. This is equivalent to (processedEndpoint == nil && err == nil) || err != nil, which simplifies to "skip if error OR if nil returned without error".

Suggested simplification
-		if processedEndpoint == nil && err == nil || err != nil {
+		if err != nil || processedEndpoint == nil {
 			continue
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/registry/file/dynamicpathdetector/analyze_endpoints.go` around lines 54 -
56, The current conditional combining processedEndpoint and err is confusing;
simplify the logic in analyze_endpoints.go by replacing the compound expression
with a clear check that skips the loop when there is an error OR when
processedEndpoint is nil (i.e., test err non-nil OR processedEndpoint nil),
updating the conditional around processedEndpoint and err to that clearer form
so intent is unambiguous.
pkg/registry/file/dynamicpathdetector/analyzer.go (2)

88-93: Config node not advanced when wildcard consumes path.

When a wildcard node exists and absorbs the rest of the path (line 92 returns), the config node is not advanced. While this may not affect behavior since we're returning immediately, it creates an inconsistency with how config advancement is handled elsewhere in the function.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/registry/file/dynamicpathdetector/analyzer.go` around lines 88 - 93, When
a wildcard node (parent.Children[WildcardIdentifier]) consumes the rest of the
path we currently increment wildcardNode.Count and return without advancing the
config-tracking variable; update this branch to also advance the config node the
same way other branches do by assigning the config parent to the wildcard node
(e.g., parent = wildcardNode or the code path used elsewhere to advance config)
before returning so config advancement is consistent with the other
segment-handling branches.

317-341: Consider converting to an iterative approach for robustness.

While recursion with slice shrinking works for current path depths (tests confirm safe operation up to 100 segments), an iterative implementation would eliminate any theoretical stack depth concerns and improve clarity for future maintainers. The current recursive approach is not a practical issue given real-world path structures, but refactoring would be a good defensive measure if path depth constraints ever change.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/registry/file/dynamicpathdetector/analyzer.go` around lines 317 - 341,
The compareSegments function currently uses recursion (compareSegments, slicing
dynamic and regular) which, while safe now, should be rewritten iteratively:
replace the recursive logic with an explicit loop and index pointers (e.g., i/j
for dynamic and regular) or an explicit stack to emulate backtracking for
WildcardIdentifier handling; keep the same semantics around WildcardIdentifier
and DynamicIdentifier comparisons and the terminal checks (empty slices and
single trailing wildcard matching everything), and ensure you still try all
possible regular offsets when encountering a wildcard by pushing/iterating
alternative j positions rather than recursing.
pkg/registry/file/applicationprofile_processor_test.go (1)

336-338: Avoid hardcoding the /etc threshold in test logic.

Line 337 hardcodes 100; this can cause brittle failures if DefaultCollapseConfigs changes. Prefer deriving the value from the same config source used by production code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/registry/file/applicationprofile_processor_test.go` around lines 336 -
338, Don't hardcode the /etc threshold; instead read it from the same config
used by production. Replace the literal 100 used for etcThreshold in the test
with a lookup from DefaultCollapseConfigs (e.g. set etcThreshold by retrieving
DefaultCollapseConfigs["/etc"] or the appropriate field like .Threshold) and use
that value in the for loop so the test stays in sync with the production config
and won't break if DefaultCollapseConfigs changes.
pkg/registry/file/applicationprofile_processor.go (1)

109-116: Consider centralizing analyzer construction in one helper.

Line 110 and Line 115 duplicate analyzer wiring that now also exists in pkg/registry/file/containerprofile_processor.go. A shared helper would reduce future config skew between processors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/registry/file/applicationprofile_processor.go` around lines 109 - 116,
The construction of path analyzers is duplicated in
deflateApplicationProfileContainer (calls to
dynamicpathdetector.NewPathAnalyzerWithConfigs for opens and endpoints) and the
container profile processor; create a single helper (e.g., NewPathAnalyzers or
BuildPathAnalyzerConfigs) that returns configured analyzers or analyzer configs
for OpenDynamicThreshold and EndpointDynamicThreshold and use that helper inside
deflateApplicationProfileContainer (and the container profile processor) to call
dynamicpathdetector.AnalyzeOpens and AnalyzeEndpoints with the centralized
analyzers/configs, ensuring both AnalyzeOpens and AnalyzeEndpoints invocations
use the shared helper to avoid future config skew.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/registry/file/dynamicpathdetector/tests/analyze_endpoints_test.go`:
- Around line 283-286: The test is using a fragile slice ep.Endpoint[:len(":0")]
to extract the port; update the loop that iterates over result and uses
ep.Endpoint to parse the port robustly (e.g., call net.SplitHostPort on
ep.Endpoint, handle the error, then assert the returned port equals ":0").
Replace the brittle slicing with net.SplitHostPort-based extraction inside the
same loop where result and ep.Endpoint are used.
- Around line 259-262: The test is slicing ep.Endpoint with a fixed length which
breaks for multi-digit ports; replace the fragile slice with proper parsing by
calling splitEndpointPortAndPath(ep.Endpoint) (or using strings functions) to
extract the port, then assert that the returned port equals ":0"; update imports
if you use strings and ensure you reference ep.Endpoint and
splitEndpointPortAndPath in the test.

---

Outside diff comments:
In `@pkg/registry/file/dynamicpathdetector/analyze_opens.go`:
- Line 12: The AnalyzeOpens function signature includes an unused parameter
sbomSet; remove sbomSet from AnalyzeOpens(...) and update all call sites to pass
only (opens, analyzer) — specifically update calls in
applicationprofile_processor.go, containerprofile_processor.go and any affected
tests to match the new signature; run tests and fix any failing call sites or
test helpers that still reference sbomSet for AnalyzeOpens.

---

Nitpick comments:
In @.gitattributes:
- Line 1: The .gitattributes entry currently applies merge=ours to .github/**
which is too broad; replace the pattern with a narrower one like
`.github/workflows/** merge=ours` to target workflows only and add a short
inline comment above the entry explaining "keep ours for auto-updated workflow
files to avoid CI churn" (or similar) so future maintainers understand the
rationale; update the existing `.github/** merge=ours` line to the new pattern
and include the comment on the next line.

In `@pkg/registry/file/applicationprofile_processor_test.go`:
- Around line 336-338: Don't hardcode the /etc threshold; instead read it from
the same config used by production. Replace the literal 100 used for
etcThreshold in the test with a lookup from DefaultCollapseConfigs (e.g. set
etcThreshold by retrieving DefaultCollapseConfigs["/etc"] or the appropriate
field like .Threshold) and use that value in the for loop so the test stays in
sync with the production config and won't break if DefaultCollapseConfigs
changes.

In `@pkg/registry/file/applicationprofile_processor.go`:
- Around line 109-116: The construction of path analyzers is duplicated in
deflateApplicationProfileContainer (calls to
dynamicpathdetector.NewPathAnalyzerWithConfigs for opens and endpoints) and the
container profile processor; create a single helper (e.g., NewPathAnalyzers or
BuildPathAnalyzerConfigs) that returns configured analyzers or analyzer configs
for OpenDynamicThreshold and EndpointDynamicThreshold and use that helper inside
deflateApplicationProfileContainer (and the container profile processor) to call
dynamicpathdetector.AnalyzeOpens and AnalyzeEndpoints with the centralized
analyzers/configs, ensuring both AnalyzeOpens and AnalyzeEndpoints invocations
use the shared helper to avoid future config skew.

In `@pkg/registry/file/dynamicpathdetector/analyze_endpoints.go`:
- Around line 182-186: The snippet marshals existingHeaders into rawJSON and
currently calls fmt.Println on error then returns; remove the fmt.Println and
either use the package's logger or propagate the error instead of printing to
stdout. Locate the json.Marshal(existingHeaders) call in analyze_endpoints.go
(rawJSON, existingHeaders) and replace the fmt.Println("Error marshaling JSON:",
err) with a structured log call (e.g., logger.Errorf or processLogger.Error) or
simply return the error up the call chain if a logger isn't available; ensure no
plain fmt prints remain.
- Around line 54-56: The current conditional combining processedEndpoint and err
is confusing; simplify the logic in analyze_endpoints.go by replacing the
compound expression with a clear check that skips the loop when there is an
error OR when processedEndpoint is nil (i.e., test err non-nil OR
processedEndpoint nil), updating the conditional around processedEndpoint and
err to that clearer form so intent is unambiguous.

In `@pkg/registry/file/dynamicpathdetector/analyzer.go`:
- Around line 88-93: When a wildcard node (parent.Children[WildcardIdentifier])
consumes the rest of the path we currently increment wildcardNode.Count and
return without advancing the config-tracking variable; update this branch to
also advance the config node the same way other branches do by assigning the
config parent to the wildcard node (e.g., parent = wildcardNode or the code path
used elsewhere to advance config) before returning so config advancement is
consistent with the other segment-handling branches.
- Around line 317-341: The compareSegments function currently uses recursion
(compareSegments, slicing dynamic and regular) which, while safe now, should be
rewritten iteratively: replace the recursive logic with an explicit loop and
index pointers (e.g., i/j for dynamic and regular) or an explicit stack to
emulate backtracking for WildcardIdentifier handling; keep the same semantics
around WildcardIdentifier and DynamicIdentifier comparisons and the terminal
checks (empty slices and single trailing wildcard matching everything), and
ensure you still try all possible regular offsets when encountering a wildcard
by pushing/iterating alternative j positions rather than recursing.

In `@pkg/registry/file/dynamicpathdetector/tests/analyze_opens_test.go`:
- Around line 11-20: The duplicate helper function configThreshold found in
analyze_opens_test.go is identical to the one in coverage_test.go; remove the
duplicate and extract the shared helper into a testutils.go within the tests
package, then update tests (e.g., TestAnalyzeOpensWithThreshold and the tests in
coverage_test.go) to import/use the shared configThreshold from testutils.go;
ensure the helper is package-visible for the tests and that
NewPathAnalyzerWithConfigs and OpenDynamicThreshold usages remain unchanged.
- Around line 603-611: Remove the unused helper function assertPathIsOneOf:
locate the function definition named assertPathIsOneOf and delete it entirely,
since there are no callers and its behavior is covered by the existing
assertContainsOneOfPaths helper; ensure no other tests reference
assertPathIsOneOf and run tests to confirm nothing breaks.

In `@pkg/registry/file/dynamicpathdetector/tests/benchmark_test.go`:
- Around line 75-82: The commented-out BenchmarkCompareDynamic block should be
either removed or restored with an explanatory TODO; either delete the entire
commented benchmark (the three-line block wrapping BenchmarkCompareDynamic) to
eliminate dead code, or uncomment it and add a TODO above the function
explaining why it's disabled and when to re-enable (e.g., TODO: enable benchmark
once X is fixed), referencing BenchmarkCompareDynamic and the call to
dynamicpathdetector.CompareDynamic so reviewers can locate it easily.

In `@pkg/registry/file/dynamicpathdetector/tests/coverage_test.go`:
- Around line 11-20: The helper function configThreshold is duplicated across
tests; remove the copy from this file and move it into a shared test helper file
(same package as the tests) so other tests can reuse it. Create a new test
helper that defines configThreshold(prefix string) int using
dynamicpathdetector.DefaultCollapseConfigs and
dynamicpathdetector.DefaultCollapseConfig.Threshold, then update tests that
currently define their own configThreshold to import/use the shared helper
instead.

In `@pkg/registry/file/dynamicpathdetector/types.go`:
- Around line 56-66: Move the TrieNode type definition above its constructor
NewTrieNode so the type declaration precedes the function that constructs it;
update the file so TrieNode (with fields Children, Config, Count) is declared
first and then NewTrieNode returns &TrieNode{Children:
make(map[string]*TrieNode)} to improve readability and follow Go conventions.
- Line 25: The inline development note after the literal {Prefix:
"/etc/apache2", Threshold: 5} should be removed or converted into a formal
comment explaining why /etc/apache2 has a lower Threshold than /etc; update the
comment adjacent to the struct literal using the Prefix and Threshold fields (or
remove it entirely) so it becomes a clear, production-ready rationale (e.g.,
"lower threshold for Apache config dir due to expected smaller file set or test
coverage reasons") rather than an informal dev note.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a93f18 and 57a7e7c.

📒 Files selected for processing (13)
  • .gitattributes
  • pkg/registry/file/applicationprofile_processor.go
  • pkg/registry/file/applicationprofile_processor_test.go
  • pkg/registry/file/cleanup.go
  • pkg/registry/file/containerprofile_processor.go
  • pkg/registry/file/dynamicpathdetector/analyze_endpoints.go
  • pkg/registry/file/dynamicpathdetector/analyze_opens.go
  • pkg/registry/file/dynamicpathdetector/analyzer.go
  • pkg/registry/file/dynamicpathdetector/tests/analyze_endpoints_test.go
  • pkg/registry/file/dynamicpathdetector/tests/analyze_opens_test.go
  • pkg/registry/file/dynamicpathdetector/tests/benchmark_test.go
  • pkg/registry/file/dynamicpathdetector/tests/coverage_test.go
  • pkg/registry/file/dynamicpathdetector/types.go

Comment on lines +259 to +262
for _, ep := range result {
port := ep.Endpoint[:len(":0")]
assert.Equal(t, ":0", port, "endpoint %s should have wildcard port", ep.Endpoint)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fragile port extraction in test assertion.

Using ep.Endpoint[:len(":0")] (equivalent to [:2]) assumes the port is always single-digit. This will break for ports like :80 or :443. Use proper parsing or the splitEndpointPortAndPath function.

Suggested fix
 	for _, ep := range result {
-		port := ep.Endpoint[:len(":0")]
-		assert.Equal(t, ":0", port, "endpoint %s should have wildcard port", ep.Endpoint)
+		assert.True(t, strings.HasPrefix(ep.Endpoint, ":0/"), "endpoint %s should have wildcard port", ep.Endpoint)
 	}

Note: You'll need to import "strings" if not already imported.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/registry/file/dynamicpathdetector/tests/analyze_endpoints_test.go` around
lines 259 - 262, The test is slicing ep.Endpoint with a fixed length which
breaks for multi-digit ports; replace the fragile slice with proper parsing by
calling splitEndpointPortAndPath(ep.Endpoint) (or using strings functions) to
extract the port, then assert that the returned port equals ":0"; update imports
if you use strings and ensure you reference ep.Endpoint and
splitEndpointPortAndPath in the test.

Comment on lines +283 to +286
for _, ep := range result {
port := ep.Endpoint[:len(":0")]
assert.Equal(t, ":0", port, "endpoint %s should have wildcard port", ep.Endpoint)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same fragile port extraction issue.

Same fix as above applies here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/registry/file/dynamicpathdetector/tests/analyze_endpoints_test.go` around
lines 283 - 286, The test is using a fragile slice ep.Endpoint[:len(":0")] to
extract the port; update the loop that iterates over result and uses ep.Endpoint
to parse the port robustly (e.g., call net.SplitHostPort on ep.Endpoint, handle
the error, then assert the returned port equals ":0"). Replace the brittle
slicing with net.SplitHostPort-based extraction inside the same loop where
result and ep.Endpoint are used.

@entlein
Copy link
Author

entlein commented Mar 3, 2026

need to remove the .gitattributes file still -- I have not tested the upstream changes with the new node-agent upstream changes. I d be more comfy, if we test this rather thouroughly AND also create a more complete CollapsConfig-Default Set that covers most Linux Paths.
Todo for Constanze: Check if we need to remove the /proc special treatment.

@matthyx : please do let me know, if there are general remarks that need reworking.

Copy link
Contributor

@matthyx matthyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the analyzer needs "some" rework and benchmarks

types "github.com/kubescape/storage/pkg/apis/softwarecomposition"
)

func AnalyzeOpens(opens []types.OpenCalls, analyzer *PathAnalyzer, sbomSet mapset.Set[string]) ([]types.OpenCalls, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sbomSet is actually quite important as it avoids collapsing any path that is present in the SBOM to ensure reproducible relevant vulnerabilities results


rawJSON, err := json.Marshal(existingHeaders)
if err != nil {
fmt.Println("Error marshaling JSON:", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably use the logger here

}
b.ReportAllocs()
}
// func BenchmarkCompareDynamic(b *testing.B) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why comment out?

return nil
}

// Skip user-managed resources (e.g., user-defined profiles)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can skip before reading the metadata

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and good catch, without this we'd delete the user-defined profile

@@ -0,0 +1 @@
.github/** merge=ours
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed to remove :)

for _, child := range node.Children {
shallowChildrenCopy(child, dynamicNode)
func (pm *PathAnalyzer) addPathToRoot(root *TrieNode, path string) {
parent := root
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

root and thus parent is mutable, we have to make sure we are thread safe

func CompareDynamic(dynamicPath, regularPath string) bool {
dynamicSegments := strings.Split(dynamicPath, "/")
regularSegments := strings.Split(regularPath, "/")
return compareSegments(dynamicSegments, regularSegments)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

splitting strings and recursion is a GC nightmare, we'll have to measure and improve if needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ask Claude to implement zero heap allocation by using inline index parsing with backtracking (supporting complex wildcard patterns like /foo//bar)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vibe coded version:

func CompareDynamic(dynamicPath, regularPath string) bool {
    dPath := strings.Trim(dynamicPath, "/")
    rPath := strings.Trim(regularPath, "/")
    
    di, ri := 0, 0
    dLen, rLen := len(dPath), len(rPath)
    
    // Helper: extract next segment starting at idx, return (segment, nextIdx)
    nextSeg := func(s string, idx int) (string, int) {
        start := idx
        for idx < len(s) && s[idx] != '/' {
            idx++
        }
        return s[start:idx], idx + 1
    }
    
    // Backtracking for wildcard matching
    type backtrack struct {
        dNext int  // Next index in dynamic after wildcard
        rNext int  // Try matching from here in regular
    }
    var stack []backtrack
    
    for di < dLen && ri < rLen {
        // Skip leading slashes
        for di < dLen && dPath[di] == '/' { di++ }
        for ri < rLen && rPath[ri] == '/' { ri++ }
        
        if di >= dLen { break }
        if ri >= rLen { break }
        
        // Extract segments
        dStart := di
        for di < dLen && dPath[di] != '/' { di++ }
        dSeg := dPath[dStart:di]
        di++ // skip '/'
        
        rStart := ri
        for ri < rLen && rPath[ri] != '/' { ri++ }
        rSeg := rPath[rStart:ri]
        ri++ // skip '/'
        
        if dSeg == string(DynamicIdentifier) {
            // Match exactly one
            continue
        } else if dSeg == string(WildcardIdentifier) {
            // Try greedy: consume all remaining
            return true
        } else if dSeg != rSeg {
            return false
        }
    }
    
    // Handle trailing segments
    for di < dLen && dPath[di] == '/' { di++ }
    if di < dLen {
        dStart := di
        for di < dLen && dPath[di] != '/' { di++ }
        dSeg := dPath[dStart:di]
        if dSeg == string(WildcardIdentifier) {
            return true
        }
        return false
    }
    
    // Regular path must also be exhausted
    for ri < rLen && rPath[ri] == '/' { ri++ }
    return ri >= rLen
}


node.Children = map[string]*SegmentNode{
DynamicIdentifier: dynamicChild,
// Special case: threshold of 1 immediately creates a wildcard (only with collapseAdjacent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need a special case here? it looks inconsistent with the normal case

// Config advances AFTER navigation so threshold applies at the correct level.
configNode := pm.root
currentConfig := &pm.defaultCfg
if configNode != nil && configNode.Config != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this interleaved dance between path and config tree is difficult to comprehend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: WIP

Development

Successfully merging this pull request may close these issues.

2 participants