Skip to content

feat: add OpenSearch based logs adapter#70

Open
nilushancosta wants to merge 2 commits intoopenchoreo:mainfrom
nilushancosta:logs-opensearch
Open

feat: add OpenSearch based logs adapter#70
nilushancosta wants to merge 2 commits intoopenchoreo:mainfrom
nilushancosta:logs-opensearch

Conversation

@nilushancosta
Copy link
Copy Markdown
Contributor

@nilushancosta nilushancosta commented Apr 2, 2026

Purpose

Adds an OpenSearch-based logs adapter for the observability module, enabling log querying and retrieval from an OpenSearch backend.
This PR moves OpenSearch related log querying code out of https://github.com/openchoreo/openchoreo

Approach

  • Introduces a new Go service (observability-logs-opensearch) that implements the observability logs adapter API. It includes an HTTP server (generated via oapi-codegen from an OpenAPI spec) with handlers that translate incoming log query requests into OpenSearch DSL queries, supporting filtering by labels, time ranges, pagination, and keyword search. This also includes changes to the helm charts for Kubernetes deployment and a multi-stage Docker build.

Compared to the OpenSearch log querying code in https://github.com/openchoreo/openchoreo, this PR

  • Updates the OpenSearch go client from v1.1.0 to v4.6.0
  • Updates the Go lang version from 1.24.12 to 1.26

Related Issues

openchoreo/openchoreo#2964

Checklist

  • Tests added or updated (unit, integration, etc.)
  • Samples updated (if applicable)

Remarks

The version of the module has not been bumped in the README. This is intentional. It will be bumped when everything is ready

Summary by CodeRabbit

  • New Features

    • Added a runnable observability logs adapter (container image), OpenAPI/codegen build targets, and a CLI-enabled build flow.
    • Helm chart, values and templates to deploy the adapter (configurable OpenSearch credentials, observer URL, resources, non‑root runtime, port 9098).
    • Env-driven configuration, structured JSON logging, startup health checks and graceful shutdown.
  • Tests

    • Comprehensive unit tests covering query builders, OpenSearch client, observer forwarding, handlers, and parsing utilities.

Signed-off-by: Nilushan Costa <nilushan@wso2.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

📝 Walkthrough

Walkthrough

Adds a new OpenSearch logs adapter: container build, Go module and Makefile, Helm chart and templates, environment-driven configuration, OpenSearch client and query builder, HTTP handlers for logs/alert rules/webhook, observer forwarding client, server lifecycle, and extensive unit tests.

Changes

Cohort / File(s) Summary
Build & Packaging
observability-logs-opensearch/Dockerfile, observability-logs-opensearch/Makefile, observability-logs-opensearch/go.mod, observability-logs-opensearch/module.yaml
Add multi-stage Dockerfile, Go module and dependencies, Make targets for oapi-codegen and tests, and module image manifest entry.
Helm chart & values
observability-logs-opensearch/helm/Chart.yaml, observability-logs-opensearch/helm/values.yaml, observability-logs-opensearch/helm/templates/adapter/*
Bump chart version and add adapter values plus conditional ConfigMap, Deployment, and Service templates enforcing non-root runtime, secret-backed credentials, and resource settings.
Configuration & Entrypoint
observability-logs-opensearch/internal/config.go, observability-logs-opensearch/main.go, observability-logs-opensearch/internal/server.go
Add environment-driven Config struct/loader with validation, main startup/shutdown flow with OpenSearch health check, JSON slog setup, and graceful server lifecycle.
HTTP Handlers & API
observability-logs-opensearch/internal/handlers.go, observability-logs-opensearch/internal/api/*
Implement StrictServer handler (Health, QueryLogs, alert rule CRUD, webhook) and add oapi-codegen config files for model/client/server generation.
OpenSearch integration
observability-logs-opensearch/internal/opensearch/*
Add OpenSearch client wrapper (health, search, monitor CRUD, index writes), query builder for logs/monitor bodies, domain types/parsers, and label utilities; includes wildcard sanitization and duration validations.
Observer integration
observability-logs-opensearch/internal/observer/client.go
Add HTTP client to forward alerts to Observer webhook with timeout and status checking.
Tests
observability-logs-opensearch/internal/*_test.go, observability-logs-opensearch/internal/opensearch/*_test.go
Add comprehensive unit tests for handlers, observer client, OpenSearch client, query builder, and types parsing using httptest fixtures.

Sequence Diagram(s)

sequenceDiagram
    participant Main
    participant Config
    participant Logger
    participant OSClient as OpenSearch Client
    participant Observer as Observer Client
    participant QB as QueryBuilder
    participant Handler
    participant Server

    Main->>Config: LoadConfig()
    Config-->>Main: Config
    Main->>Logger: init JSON logger
    Main->>OSClient: NewClient(...) / CheckHealth(ctx)
    OSClient-->>Main: healthy
    Main->>Observer: NewClient(observerUrl)
    Main->>QB: NewQueryBuilder(indexPrefix)
    Main->>Handler: NewLogsHandler(osClient, qb, observer, logger)
    Main->>Server: NewServer(port, handler, logger)
    Main->>Server: Start()
Loading
sequenceDiagram
    participant Client
    participant Handler
    participant QB as QueryBuilder
    participant OSClient as OpenSearch Client
    participant Observer as Observer Client

    Client->>Handler: QueryLogs(request)
    Handler->>QB: Build query & indices
    QB-->>Handler: query body, indices
    Handler->>OSClient: Search(ctx, indices, query)
    OSClient-->>Handler: SearchResponse
    Handler->>Client: 200 QueryLogsResponse

    Note over Handler,Observer: Webhook path (async)
    Client->>Handler: HandleAlertWebhook(body)
    Handler->>Observer: ForwardAlert (goroutine)
    Handler-->>Client: 200 success
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibble logs and hop through code by night,
I stitch alerts in moonbeams, forward them in flight,
Docker, Helm, queries — a cosy little den,
I twitch my nose and watch observability spin.
Thump—deploy the adapter, and I'll hop again.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.63% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding an OpenSearch-based logs adapter.
Description check ✅ Passed The description covers Purpose, Approach, and Related Issues with good detail. The Checklist is partially completed (Tests marked done, Samples marked not applicable). All required sections are present and substantively filled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (14)
observability-logs-opensearch/internal/observer/client.go (1)

27-29: Avoid two independent 10s timeouts for the same outbound call.

Line 28 sets http.Client.Timeout to 10s, while the caller also wraps ForwardAlert with a 10s context timeout (observability-logs-opensearch/internal/handlers.go, Lines 500-509). This creates timeout races and inconsistent failure surfaces. Prefer a single timeout owner (typically request context), or make client timeout larger than call-site deadlines.

Proposed simplification
 func NewClient(baseURL string) *Client {
 	return &Client{
 		baseURL: strings.TrimRight(baseURL, "/"),
-		httpClient: &http.Client{
-			Timeout: 10 * time.Second,
-		},
+		httpClient: &http.Client{},
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/observer/client.go` around lines 27 -
29, The http.Client in client.go currently sets a 10s Timeout which duplicates
the 10s context deadline applied by the caller of ForwardAlert (handlers.go),
causing racey/ambiguous timeouts; remove the client-level Timeout (set it to 0)
and rely on the request context passed into ForwardAlert (ensure requests are
created with http.NewRequestWithContext) so the context is the single owner of
the deadline, or alternatively increase the client Timeout to a value larger
than the caller deadline if you must keep it. Ensure changes touch the
httpClient instantiation in client.go and the request creation inside
ForwardAlert so the ctx controls cancellation.
observability-logs-opensearch/internal/config.go (1)

45-57: Invalid LOG_LEVEL values are silently ignored.

If an unrecognized log level is provided (e.g., "TRACE" or "verbose"), the code falls through and defaults to LevelInfo without any warning. Consider logging a warning or returning an error for invalid values.

♻️ Proposed fix to warn on invalid log level
 	logLevel := slog.LevelInfo
 	if level := os.Getenv("LOG_LEVEL"); level != "" {
 		switch strings.ToUpper(level) {
 		case "DEBUG":
 			logLevel = slog.LevelDebug
 		case "INFO":
 			logLevel = slog.LevelInfo
 		case "WARN", "WARNING":
 			logLevel = slog.LevelWarn
 		case "ERROR":
 			logLevel = slog.LevelError
+		default:
+			return nil, fmt.Errorf("invalid LOG_LEVEL %q: must be DEBUG, INFO, WARN, or ERROR", level)
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/config.go` around lines 45 - 57, The
LOG_LEVEL parsing silently ignores unknown values — update the switch over
strings.ToUpper(level) that sets logLevel to handle a default/else branch: if
the value is unrecognized (e.g., "TRACE", "verbose") emit a warning via the
package logger (slog.Warn or similar) mentioning the invalid LOG_LEVEL and the
accepted values and keep the previous/default level, or alternatively return an
error from the surrounding config loader; reference the existing logLevel
variable, the LOG_LEVEL env lookup, and the switch block to add the default-case
warning/validation behavior.
observability-logs-opensearch/helm/templates/adapter/deployment.yaml (1)

28-60: Consider adding liveness and readiness probes.

The container lacks health probes, which limits Kubernetes' ability to detect unhealthy pods and manage traffic routing. This is particularly important since the adapter makes external calls to OpenSearch.

♻️ Proposed fix to add health probes

If the OpenAPI spec includes a health endpoint, or if you add one:

         ports:
         - containerPort: 9098
+        livenessProbe:
+          httpGet:
+            path: /healthz
+            port: 9098
+          initialDelaySeconds: 5
+          periodSeconds: 10
+        readinessProbe:
+          httpGet:
+            path: /readyz
+            port: 9098
+          initialDelaySeconds: 5
+          periodSeconds: 5
         envFrom:

Alternatively, use a TCP socket probe if no health endpoint exists:

        livenessProbe:
          tcpSocket:
            port: 9098
          initialDelaySeconds: 5
          periodSeconds: 10
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/helm/templates/adapter/deployment.yaml` around
lines 28 - 60, The container "logs-adapter-opensearch" has no Kubernetes health
probes; add both livenessProbe and readinessProbe for port 9098 to allow K8s to
detect and restart/unroute unhealthy pods: if the adapter exposes an HTTP health
endpoint (e.g., /health or /ready) use httpGet probes pointing at port 9098 with
sensible timings (initialDelaySeconds, periodSeconds, timeoutSeconds) for both
probes; otherwise add tcpSocket probes against port 9098 for liveness and
readiness with similar timing settings; update the container spec near the
"ports" and "env" blocks to include these probe definitions.
observability-logs-opensearch/Dockerfile (1)

12-12: Pin the Alpine version for reproducible builds.

Using alpine:latest can lead to inconsistent builds over time as the tag gets updated. Pin to a specific version for reproducibility and to avoid unexpected breaking changes.

♻️ Proposed fix
-FROM alpine:latest
+FROM alpine:3.21
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/Dockerfile` at line 12, The Dockerfile uses an
unpinned base image "FROM alpine:latest" which can cause non-reproducible
builds; change that line to pin to a specific, supported Alpine tag (for example
"FROM alpine:3.18" or your chosen stable patch version) so builds are
deterministic and avoid unexpected breakages—update the Dockerfile's FROM
alpine:latest entry accordingly and ensure the chosen version is documented or
matches other images in the repo.
observability-logs-opensearch/helm/values.yaml (1)

172-186: Add a comment clarifying that openSearchSecretName must be provided.

The empty default for openSearchSecretName will cause the Deployment to fail at runtime when Kubernetes cannot find the referenced Secret. While this follows the existing pattern (see openSearchSetup.openSearchSecretName at line 191), adding documentation helps users understand this is a required override.

📝 Suggested documentation
 adapter:
   enabled: true
   observerUrl: "http://observer-internal.openchoreo-observability-plane:8081"
+  # Required: Name of the Kubernetes Secret containing OpenSearch credentials
+  # The secret must have 'username' and 'password' keys
   openSearchSecretName: ""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/helm/values.yaml` around lines 172 - 186, The
Helm values file's adapter.openSearchSecretName is left empty which will cause
the Deployment to fail if not overridden; add a comment next to
adapter.openSearchSecretName explaining that this value is required and must be
set to the name of an existing Kubernetes Secret (e.g.,
"name-of-opensearch-secret"), and mirror the guidance used for
openSearchSetup.openSearchSecretName so users know to override it in their
values file before deploying.
observability-logs-opensearch/helm/templates/adapter/configmap.yaml (1)

13-15: Consider making OpenSearch connection parameters configurable.

OPENSEARCH_ADDRESS and OPENSEARCH_INDEX_PREFIX are hardcoded, while OBSERVER_URL is already sourced from values. This limits deployment flexibility for clusters where OpenSearch runs on a different address or uses a different index prefix.

♻️ Proposed refactor to make these values configurable

Add to values.yaml under adapter:

 adapter:
   enabled: true
   observerUrl: "http://observer-internal.openchoreo-observability-plane:8081"
   openSearchSecretName: ""
+  openSearchAddress: "https://opensearch:9200"
+  openSearchIndexPrefix: "container-logs-"

Then update the ConfigMap:

 data:
   SERVER_PORT: "9098"
-  OPENSEARCH_ADDRESS: "https://opensearch:9200"
-  OPENSEARCH_INDEX_PREFIX: "container-logs-"
+  OPENSEARCH_ADDRESS: {{ .Values.adapter.openSearchAddress | quote }}
+  OPENSEARCH_INDEX_PREFIX: {{ .Values.adapter.openSearchIndexPrefix | quote }}
   OBSERVER_URL: {{ .Values.adapter.observerUrl | quote }}

Note: The same hardcoding exists in helm/templates/opensearch-setup-logs/job.yaml (line 24). Consider addressing both for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/helm/templates/adapter/configmap.yaml` around
lines 13 - 15, Make OPENSEARCH_ADDRESS and OPENSEARCH_INDEX_PREFIX configurable
via values by adding entries under adapter (e.g. adapter.opensearch.address and
adapter.opensearch.indexPrefix) in values.yaml and update the ConfigMap template
in helm/templates/adapter/configmap.yaml to use Helm value interpolation
(referencing .Values.adapter.opensearch.address and
.Values.adapter.opensearch.indexPrefix, with sensible defaults matching the
current literals) instead of the hardcoded "https://opensearch:9200" and
"container-logs-"; also apply the same change in
helm/templates/opensearch-setup-logs/job.yaml (replace the hardcoded
OPENSEARCH_ADDRESS/OPENSEARCH_INDEX_PREFIX there with the same .Values
references) so deployments can override OpenSearch endpoint and index prefix
consistently while keeping OBSERVER_URL usage unchanged.
observability-logs-opensearch/Makefile (1)

3-3: Remote spec URL creates a network dependency for code generation.

The openapi-codegen target fetches the spec from a remote URL, which means builds will fail without network access. This is acceptable for CI but may hinder local development in offline scenarios. Consider documenting this dependency or providing an option to use a local spec file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/Makefile` at line 3, The Makefile currently
hardcodes SPEC to a remote URL which creates a network dependency for the
openapi-codegen target; update the Makefile to allow overriding SPEC from the
environment or a make variable (e.g., check if SPEC is set before assigning) and
add a short comment above the SPEC definition explaining the remote fetch and
advising how to provide a local spec for offline development, and ensure the
openapi-codegen target uses the SPEC variable so local files can be used by
setting SPEC=/path/to/local.yaml.
observability-logs-opensearch/internal/opensearch/queries.go (1)

267-282: Redundant check for end index inclusion.

The loop condition current.Before(end) || current.Equal(end) already ensures the end date is included when iterating. The subsequent check on lines 276-279 is defensive but unnecessary.

This is a minor readability nit; the logic is correct.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/opensearch/queries.go` around lines
267 - 282, The code builds daily index names into indices by iterating with
current (start) until current.Before(end) || current.Equal(end), which already
includes the end date, so remove the redundant endIndexName creation and the
contains(...) conditional block; update the function that constructs indices
(the loop using current, qb.indexPrefix and current.Format("2006-01-02")) to
return indices directly after the loop and delete the extra end inclusion logic
(and any unused variables like endIndexName) to improve readability.
observability-logs-opensearch/internal/handlers.go (3)

174-175: Error from FromLogsQueryResponseLogs0 is ignored.

The error return from logs.FromLogsQueryResponseLogs0(entries) is discarded with _. If this conversion fails, the response will have nil/empty logs without indication of the failure.

Consider logging or handling this error, even if unlikely to occur.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/handlers.go` around lines 174 - 175,
The call to logs.FromLogsQueryResponseLogs0(entries) currently ignores its error
result which can silently produce nil/empty resp.Logs; change this to capture
the return values (e.g., logsObj, err :=
logs.FromLogsQueryResponseLogs0(entries)), then handle err instead of discarding
it—either log the error using the package logger or return/propagate an error
from the handler—and only set resp.Logs = &logsObj when err == nil; reference
the FromLogsQueryResponseLogs0 call, the entries variable, and resp.Logs when
implementing this fix.

437-442: Consider returning 404 when alert rule not found during update.

Currently returns 400 Bad Request when the monitor doesn't exist. A 404 Not Found would be more semantically correct and consistent with GetAlertRule.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/handlers.go` around lines 437 - 442,
The handler currently returns a gen.UpdateAlertRule400JSONResponse when the
alert rule isn't found; change this to return a 404-style response instead:
replace the gen.UpdateAlertRule400JSONResponse construction with the appropriate
gen.UpdateAlertRule404JSONResponse (or the existing NotFound response type used
by GetAlertRule) and set Title to ptr(gen.NotFound) and Message to ptr("alert
rule not found") so the UpdateAlertRule handler returns HTTP 404 when the
monitor/alert rule is missing.

500-509: Fire-and-forget goroutine may lose alerts on shutdown.

The goroutine spawned for ForwardAlert is detached from the request lifecycle. If the service shuts down while alerts are in-flight, they will be silently dropped. Consider using a worker pool with graceful shutdown, or at minimum track in-flight operations.

Additionally, failed forwards are only logged without retry. For critical alerting workflows, consider implementing retry logic or a dead-letter queue.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/handlers.go` around lines 500 - 509,
The detached goroutine that calls h.observerClient.ForwardAlert (in the
anonymous go func in handlers.go) can drop alerts on shutdown and lacks retries;
replace the fire-and-forget pattern by enqueuing the forward request into a
managed worker queue or at minimum incrementing a shared sync.WaitGroup (or
tracking counter) before spawning the worker and decrementing it after
completion so shutdown can wait for in-flight forwards, and add retry logic
(e.g., exponential backoff with a few attempts) around
h.observerClient.ForwardAlert and fallback to a dead-letter queue or persistent
store on persistent failure; ensure the shutdown/path that cancels context waits
for the WaitGroup or drains the worker queue so alerts are not lost and use the
existing h.logger to record retry attempts and final failures.
observability-logs-opensearch/internal/opensearch/types.go (2)

222-226: Silently ignoring json.Marshal error may hide issues.

If marshaling fails (e.g., due to an unmarshalable type like a channel in the query map), the function returns an empty reader, causing confusing downstream errors in the search request.

♻️ Suggested fix: return error or log warning
-// buildSearchBody converts a query map to an io.Reader for the search request.
-func buildSearchBody(query map[string]interface{}) io.Reader {
-	body, _ := json.Marshal(query)
-	return strings.NewReader(string(body))
+// buildSearchBody converts a query map to an io.Reader for the search request.
+func buildSearchBody(query map[string]interface{}) (io.Reader, error) {
+	body, err := json.Marshal(query)
+	if err != nil {
+		return nil, fmt.Errorf("failed to marshal query: %w", err)
+	}
+	return strings.NewReader(string(body)), nil
 }

This would require updating the caller in client.go to handle the error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/opensearch/types.go` around lines 222
- 226, The buildSearchBody function currently swallows json.Marshal errors
(buildSearchBody) which can mask invalid query payloads; change buildSearchBody
to return (io.Reader, error) and propagate the json.Marshal error instead of
returning an empty reader, update its callers in client.go to handle the error
(either return it up the stack or log and fail the request), and ensure any
tests or call sites that expect the old signature are updated accordingly;
alternatively, if you prefer not to change signatures, at minimum log the
marshal error with context and return nil plus the error so downstream code can
fail fast.

293-308: Log level detection is simplistic and may produce false positives.

The substring-based matching can misclassify logs. For example, a log containing "INFORMATION" would match "INFO", and "UNDEFINED" is treated as a valid log level rather than a fallback. Consider using word boundaries or more precise patterns if accuracy is important.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/opensearch/types.go` around lines 293
- 308, The extractLogLevel function currently does substring matching against
logLevels which causes false positives (e.g., "INFORMATION" matching "INFO") and
treats "UNDEFINED" as a real level; update extractLogLevel to perform whole-word
matching using word-boundary regex (e.g., \bLEVEL\b) or tokenization and
normalize the log string first, check levels in order of decreasing length to
avoid substring collisions, map "WARNING" to "WARN" as before, and remove
"UNDEFINED" from the logLevels list so that the function falls back to a final
default (e.g., "INFO") only when no whole-word level is found.
observability-logs-opensearch/internal/opensearch/client.go (1)

122-127: Potential nil pointer dereference when converting Score.

If h.Score is 0 or the field is missing/null in the response, converting it directly to float64 and taking its address may not behave as expected. The current code always sets a non-nil Score pointer.

Consider checking if the score is meaningful before assignment, or document that Score will always be set (possibly to 0).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/opensearch/client.go` around lines 122
- 127, The code unconditionally converts h.Score to float64 and assigns its
address to hit.Score, risking a misleading non-nil pointer when h.Score is
missing or null; update the logic around the hit.Score assignment (the block
using h.Score, hit.Score, and response.Hits.Hits) to check whether h.Score is
actually present/valid before creating and assigning a *float64 (e.g., only
allocate and set scorePtr when h.Score != nil or when the source indicates a
valid score), otherwise leave hit.Score nil (or explicitly set to pointer to
zero if you choose to document that behavior) so callers can distinguish absent
vs present scores.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@observability-logs-opensearch/internal/config.go`:
- Around line 36-43: The TLS skip-verification flag currently defaults to true
(tlsSkipVerify := true) which is insecure; change the default to false
(tlsSkipVerify := false) and keep the existing parsing of
OPENSEARCH_TLS_SKIP_VERIFY so only an explicit true opt-out will disable
certificate verification; update any related variable/constant names or comments
in the same file (e.g., references to tlsSkipVerify and the environment variable
OPENSEARCH_TLS_SKIP_VERIFY) to reflect the new secure default.

In `@observability-logs-opensearch/internal/handlers.go`:
- Around line 337-342: The handler currently returns
gen.DeleteAlertRule500JSONResponse when the alert rule isn't found; change this
to return gen.DeleteAlertRule404JSONResponse instead so the response follows
REST semantics. Locate the branch checking the boolean found (the block that
returns gen.DeleteAlertRule500JSONResponse) and replace that response with
gen.DeleteAlertRule404JSONResponse, setting Title to ptr(gen.NotFound) and
Message to ptr("alert rule not found") so the returned type and status match a
404 Not Found.
- Around line 698-746: The monitor query doesn't include namespace so
extractQueryMetadata never receives it; update BuildLogAlertingRuleQuery to add
a term filter for params.Metadata.Namespace (same approach used in
BuildComponentLogsQueryV1) and then modify extractQueryMetadata to handle the
namespace term by adding a case for opensearch.OSNamespaceName that assigns
namespace = value; this ensures parseMonitorToAlertRuleResponse receives and
returns the namespace metadata.

In `@observability-logs-opensearch/internal/observer/client.go`:
- Around line 40-79: Add table-driven unit tests for the outbound HTTP path
exercised by Client.ForwardAlert: create subtests for (1) success (200)
returning no error, (2) non-2xx with a response body and assert the error
contains the status code and trimmed body, and (3) transport/context
cancellation or timeout causing an error. Use httptest.Server to capture and
assert the incoming request JSON payload and Content-Type header, and set the
Client.httpClient to the server.Client() (or a client with a short
Transport/Timeout) so ForwardAlert sends to the test server; for the
cancellation case, use a context with cancel/timeout to trigger the request
abort and assert the returned error reflects the cancellation/timeout.

In `@observability-logs-opensearch/internal/opensearch/client.go`:
- Around line 36-40: The TLS config in the http.Transport construction
(Transport: &http.Transport{ TLSClientConfig: &tls.Config{ InsecureSkipVerify:
insecureSkipVerify } }) does not set MinVersion, allowing older TLS versions;
update the tls.Config used by Transport to explicitly set MinVersion to
tls.VersionTLS12 (or tls.VersionTLS13 if supported) to prevent downgrade attacks
while keeping the existing InsecureSkipVerify handling.
- Around line 189-194: The CreateMonitor function is ignoring the passed ctx
because it uses http.NewRequest; change the request creation to use
http.NewRequestWithContext(ctx, "POST", path, bytes.NewReader(body)) so the
context (cancellation/timeout) is propagated, keep the existing error handling
around request creation and ensure req.Header.Add("Content-Type",
"application/json") remains after the new request is created.
- Around line 70-80: The health-check code in client.go consumes res.Body with
io.ReadAll before attempting to json.Decode, leaving an empty reader; change the
flow in the health check (the block that checks res.StatusCode and the
subsequent JSON decode of health) to read the response body once into a byte
slice and then use that byte slice for both the error message and JSON parsing
(e.g., json.Unmarshal into the local health struct) or else buffer and replace
res.Body (using io.NopCloser) so json.NewDecoder can read it; update the error
branch and the decode branch around the health struct to use the single buffered
body instead of reading res.Body twice.

In `@observability-logs-opensearch/internal/opensearch/queries.go`:
- Around line 445-458: GetOperatorSymbol currently returns an empty string for
unknown operators which produces malformed painless scripts; change
GetOperatorSymbol to validate input and return an explicit error (e.g., change
signature to GetOperatorSymbol(operator string) (string, error)) or a clear
default like an error indicator, and update its caller
BuildLogAlertingRuleMonitorBody to handle the error (fail validation and return
it rather than building an invalid script). Ensure the new behavior returns a
non-empty valid symbol for "gt","gte","lt","lte" and returns a descriptive error
for unknown operators so BuildLogAlertingRuleMonitorBody can abort or surface
the error to the caller.

In `@observability-logs-opensearch/main.go`:
- Around line 73-78: The goroutine that runs srv.Start currently calls os.Exit
on error which bypasses defers (cancel(), shutdownCancel()) and can race with
signal setup; replace that immediate exit by sending the error to a dedicated
channel (e.g., errCh) so the main goroutine can handle shutdown. Modify the
anonymous goroutine (the one invoking srv.Start) to send the returned error to
errCh instead of calling os.Exit, and update the main routine to select on errCh
and signal context cancellation to run graceful cleanup and logging via
logger.Error when an error arrives.

In `@observability-logs-opensearch/Makefile`:
- Around line 17-19: The unit-test Makefile target currently unconditionally
runs mv on coverage.out which will fail if no coverage.out was produced; update
the unit-test target (the commands under the "unit-test" target that reference
coverage.out and $(MODULE_NAME)-coverage.out) to only move or copy the file when
coverage.out exists (e.g., check for existence with a shell conditional or use a
guarded cp/mv command) so the make target succeeds when no tests/coverage are
generated.

---

Nitpick comments:
In `@observability-logs-opensearch/Dockerfile`:
- Line 12: The Dockerfile uses an unpinned base image "FROM alpine:latest" which
can cause non-reproducible builds; change that line to pin to a specific,
supported Alpine tag (for example "FROM alpine:3.18" or your chosen stable patch
version) so builds are deterministic and avoid unexpected breakages—update the
Dockerfile's FROM alpine:latest entry accordingly and ensure the chosen version
is documented or matches other images in the repo.

In `@observability-logs-opensearch/helm/templates/adapter/configmap.yaml`:
- Around line 13-15: Make OPENSEARCH_ADDRESS and OPENSEARCH_INDEX_PREFIX
configurable via values by adding entries under adapter (e.g.
adapter.opensearch.address and adapter.opensearch.indexPrefix) in values.yaml
and update the ConfigMap template in helm/templates/adapter/configmap.yaml to
use Helm value interpolation (referencing .Values.adapter.opensearch.address and
.Values.adapter.opensearch.indexPrefix, with sensible defaults matching the
current literals) instead of the hardcoded "https://opensearch:9200" and
"container-logs-"; also apply the same change in
helm/templates/opensearch-setup-logs/job.yaml (replace the hardcoded
OPENSEARCH_ADDRESS/OPENSEARCH_INDEX_PREFIX there with the same .Values
references) so deployments can override OpenSearch endpoint and index prefix
consistently while keeping OBSERVER_URL usage unchanged.

In `@observability-logs-opensearch/helm/templates/adapter/deployment.yaml`:
- Around line 28-60: The container "logs-adapter-opensearch" has no Kubernetes
health probes; add both livenessProbe and readinessProbe for port 9098 to allow
K8s to detect and restart/unroute unhealthy pods: if the adapter exposes an HTTP
health endpoint (e.g., /health or /ready) use httpGet probes pointing at port
9098 with sensible timings (initialDelaySeconds, periodSeconds, timeoutSeconds)
for both probes; otherwise add tcpSocket probes against port 9098 for liveness
and readiness with similar timing settings; update the container spec near the
"ports" and "env" blocks to include these probe definitions.

In `@observability-logs-opensearch/helm/values.yaml`:
- Around line 172-186: The Helm values file's adapter.openSearchSecretName is
left empty which will cause the Deployment to fail if not overridden; add a
comment next to adapter.openSearchSecretName explaining that this value is
required and must be set to the name of an existing Kubernetes Secret (e.g.,
"name-of-opensearch-secret"), and mirror the guidance used for
openSearchSetup.openSearchSecretName so users know to override it in their
values file before deploying.

In `@observability-logs-opensearch/internal/config.go`:
- Around line 45-57: The LOG_LEVEL parsing silently ignores unknown values —
update the switch over strings.ToUpper(level) that sets logLevel to handle a
default/else branch: if the value is unrecognized (e.g., "TRACE", "verbose")
emit a warning via the package logger (slog.Warn or similar) mentioning the
invalid LOG_LEVEL and the accepted values and keep the previous/default level,
or alternatively return an error from the surrounding config loader; reference
the existing logLevel variable, the LOG_LEVEL env lookup, and the switch block
to add the default-case warning/validation behavior.

In `@observability-logs-opensearch/internal/handlers.go`:
- Around line 174-175: The call to logs.FromLogsQueryResponseLogs0(entries)
currently ignores its error result which can silently produce nil/empty
resp.Logs; change this to capture the return values (e.g., logsObj, err :=
logs.FromLogsQueryResponseLogs0(entries)), then handle err instead of discarding
it—either log the error using the package logger or return/propagate an error
from the handler—and only set resp.Logs = &logsObj when err == nil; reference
the FromLogsQueryResponseLogs0 call, the entries variable, and resp.Logs when
implementing this fix.
- Around line 437-442: The handler currently returns a
gen.UpdateAlertRule400JSONResponse when the alert rule isn't found; change this
to return a 404-style response instead: replace the
gen.UpdateAlertRule400JSONResponse construction with the appropriate
gen.UpdateAlertRule404JSONResponse (or the existing NotFound response type used
by GetAlertRule) and set Title to ptr(gen.NotFound) and Message to ptr("alert
rule not found") so the UpdateAlertRule handler returns HTTP 404 when the
monitor/alert rule is missing.
- Around line 500-509: The detached goroutine that calls
h.observerClient.ForwardAlert (in the anonymous go func in handlers.go) can drop
alerts on shutdown and lacks retries; replace the fire-and-forget pattern by
enqueuing the forward request into a managed worker queue or at minimum
incrementing a shared sync.WaitGroup (or tracking counter) before spawning the
worker and decrementing it after completion so shutdown can wait for in-flight
forwards, and add retry logic (e.g., exponential backoff with a few attempts)
around h.observerClient.ForwardAlert and fallback to a dead-letter queue or
persistent store on persistent failure; ensure the shutdown/path that cancels
context waits for the WaitGroup or drains the worker queue so alerts are not
lost and use the existing h.logger to record retry attempts and final failures.

In `@observability-logs-opensearch/internal/observer/client.go`:
- Around line 27-29: The http.Client in client.go currently sets a 10s Timeout
which duplicates the 10s context deadline applied by the caller of ForwardAlert
(handlers.go), causing racey/ambiguous timeouts; remove the client-level Timeout
(set it to 0) and rely on the request context passed into ForwardAlert (ensure
requests are created with http.NewRequestWithContext) so the context is the
single owner of the deadline, or alternatively increase the client Timeout to a
value larger than the caller deadline if you must keep it. Ensure changes touch
the httpClient instantiation in client.go and the request creation inside
ForwardAlert so the ctx controls cancellation.

In `@observability-logs-opensearch/internal/opensearch/client.go`:
- Around line 122-127: The code unconditionally converts h.Score to float64 and
assigns its address to hit.Score, risking a misleading non-nil pointer when
h.Score is missing or null; update the logic around the hit.Score assignment
(the block using h.Score, hit.Score, and response.Hits.Hits) to check whether
h.Score is actually present/valid before creating and assigning a *float64
(e.g., only allocate and set scorePtr when h.Score != nil or when the source
indicates a valid score), otherwise leave hit.Score nil (or explicitly set to
pointer to zero if you choose to document that behavior) so callers can
distinguish absent vs present scores.

In `@observability-logs-opensearch/internal/opensearch/queries.go`:
- Around line 267-282: The code builds daily index names into indices by
iterating with current (start) until current.Before(end) || current.Equal(end),
which already includes the end date, so remove the redundant endIndexName
creation and the contains(...) conditional block; update the function that
constructs indices (the loop using current, qb.indexPrefix and
current.Format("2006-01-02")) to return indices directly after the loop and
delete the extra end inclusion logic (and any unused variables like
endIndexName) to improve readability.

In `@observability-logs-opensearch/internal/opensearch/types.go`:
- Around line 222-226: The buildSearchBody function currently swallows
json.Marshal errors (buildSearchBody) which can mask invalid query payloads;
change buildSearchBody to return (io.Reader, error) and propagate the
json.Marshal error instead of returning an empty reader, update its callers in
client.go to handle the error (either return it up the stack or log and fail the
request), and ensure any tests or call sites that expect the old signature are
updated accordingly; alternatively, if you prefer not to change signatures, at
minimum log the marshal error with context and return nil plus the error so
downstream code can fail fast.
- Around line 293-308: The extractLogLevel function currently does substring
matching against logLevels which causes false positives (e.g., "INFORMATION"
matching "INFO") and treats "UNDEFINED" as a real level; update extractLogLevel
to perform whole-word matching using word-boundary regex (e.g., \bLEVEL\b) or
tokenization and normalize the log string first, check levels in order of
decreasing length to avoid substring collisions, map "WARNING" to "WARN" as
before, and remove "UNDEFINED" from the logLevels list so that the function
falls back to a final default (e.g., "INFO") only when no whole-word level is
found.

In `@observability-logs-opensearch/Makefile`:
- Line 3: The Makefile currently hardcodes SPEC to a remote URL which creates a
network dependency for the openapi-codegen target; update the Makefile to allow
overriding SPEC from the environment or a make variable (e.g., check if SPEC is
set before assigning) and add a short comment above the SPEC definition
explaining the remote fetch and advising how to provide a local spec for offline
development, and ensure the openapi-codegen target uses the SPEC variable so
local files can be used by setting SPEC=/path/to/local.yaml.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3572b40a-5220-4f10-96c3-ed5cb25d521d

📥 Commits

Reviewing files that changed from the base of the PR and between bd7385c and f5f60e9.

⛔ Files ignored due to path filters (4)
  • observability-logs-opensearch/go.sum is excluded by !**/*.sum
  • observability-logs-opensearch/internal/api/gen/client.gen.go is excluded by !**/gen/**
  • observability-logs-opensearch/internal/api/gen/models.gen.go is excluded by !**/gen/**
  • observability-logs-opensearch/internal/api/gen/server.gen.go is excluded by !**/gen/**
📒 Files selected for processing (21)
  • observability-logs-opensearch/Dockerfile
  • observability-logs-opensearch/Makefile
  • observability-logs-opensearch/go.mod
  • observability-logs-opensearch/helm/Chart.yaml
  • observability-logs-opensearch/helm/templates/adapter/configmap.yaml
  • observability-logs-opensearch/helm/templates/adapter/deployment.yaml
  • observability-logs-opensearch/helm/templates/adapter/service.yaml
  • observability-logs-opensearch/helm/values.yaml
  • observability-logs-opensearch/internal/api/cfg-client.yaml
  • observability-logs-opensearch/internal/api/cfg-models.yaml
  • observability-logs-opensearch/internal/api/cfg-server.yaml
  • observability-logs-opensearch/internal/config.go
  • observability-logs-opensearch/internal/handlers.go
  • observability-logs-opensearch/internal/observer/client.go
  • observability-logs-opensearch/internal/opensearch/client.go
  • observability-logs-opensearch/internal/opensearch/labels.go
  • observability-logs-opensearch/internal/opensearch/queries.go
  • observability-logs-opensearch/internal/opensearch/types.go
  • observability-logs-opensearch/internal/server.go
  • observability-logs-opensearch/main.go
  • observability-logs-opensearch/module.yaml

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (4)
observability-logs-opensearch/Makefile (1)

3-13: Pin the OpenAPI spec to an immutable revision.

SPEC currently tracks a branch head, so make openapi-codegen can start producing different generated code or fail without any change in this repo. Please vendor the spec or switch this to a commit SHA so regeneration stays reproducible across CI and developer machines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/Makefile` around lines 3 - 13, The Makefile's
SPEC variable currently points to a branch head URL which is mutable; update
SPEC to reference an immutable source (either vendor the spec file into the repo
and point SPEC to the local path, or replace the URL with a raw file URL that
includes a specific commit SHA) so that the openapi-codegen target (targets:
openapi-codegen and oapi-codegen-install) always regenerates from a reproducible
spec; ensure the new SPEC still works with the existing oapi-codegen calls in
the Makefile (and update any docs or CI that expect the old URL).
observability-logs-opensearch/internal/handlers.go (3)

500-509: Consider tracking in-flight webhook forwards for graceful shutdown.

The fire-and-forget goroutine with context.Background() is acceptable for webhook acknowledgment, but in-flight forwards may be lost during server shutdown. For improved reliability, consider using a sync.WaitGroup or a worker pool that the server can drain during graceful termination.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/handlers.go` around lines 500 - 509,
The fire-and-forget goroutine that calls h.observerClient.ForwardAlert uses
context.Background() and can be terminated during server shutdown; modify the
code to track in-flight forwards using a sync.WaitGroup (or a dedicated worker
pool) stored on the handler (e.g., add a field like wg sync.WaitGroup to the
handler struct), increment the WG before spawning the goroutine, defer wg.Done()
inside it, and use a shutdown signaling context (or the server’s shutdown
sequence) to wait on wg.Wait() during graceful termination so all ForwardAlert
calls complete; ensure ForwardAlert still receives a bounded context (e.g.,
context.WithTimeout) derived from the handler’s shutdown-aware context.

772-805: Consider extracting helper functions to reduce nesting depth.

The function has 8 levels of nesting due to the OpenSearch query structure. While functional, breaking this into smaller helpers (e.g., getInputSearch, getFilterFromBool) would improve readability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/handlers.go` around lines 772 - 805,
The extractWindowFromQuery function is deeply nested; refactor by extracting
small helpers to flatten logic: create getInputSearch(monitor
map[string]interface{}) (map[string]interface{}, bool) to return the first
input's "search" map, getQueryMap(search map[string]interface{})
(map[string]interface{}, bool) to return the inner "query" map,
getBoolFilter(queryMap map[string]interface{}) ([]interface{}, bool) to return
the "bool" -> "filter" slice, and getTimestampFromFilter(filter
map[string]interface{}) (string, bool) to extract the "@timestamp.from" string;
then rewrite extractWindowFromQuery to call these helpers in sequence and return
the parsed window substring, which will remove the deep nested type assertions
and improve readability.

534-544: Handle integer type for alertValue.

The switch handles float64 and string, but JSON integers from some serializers may arrive as int or int64. Currently, these would silently result in alertValue = 0.

♻️ Proposed enhancement
 if countVal, ok := body["alertValue"]; ok {
 	switch v := countVal.(type) {
 	case float64:
 		alertValue = v
+	case int:
+		alertValue = float64(v)
+	case int64:
+		alertValue = float64(v)
 	case string:
 		alertValue, err = strconv.ParseFloat(v, 64)
 		if err != nil {
 			return "", "", 0, time.Time{}, fmt.Errorf("failed to parse alertValue %q: %w", v, err)
 		}
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/handlers.go` around lines 534 - 544,
The switch on countVal in handlers.go only handles float64 and string, causing
integer JSON values to be ignored; update the switch in the block that reads
body["alertValue"] (the countVal handling that sets alertValue) to also handle
integer types (e.g., int, int32, int64, uint, uint64) by converting them to
float64 (e.g., via type assertion to the integer type then casting to float64),
and consider handling json.Number if used by your decoder by parsing it to
float64; keep the existing string parsing/error path intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@observability-logs-opensearch/internal/handlers.go`:
- Around line 437-442: The update handler currently returns a 400 response when
an alert rule isn't found; change the response to a 404 to match REST semantics
and the behavior of DeleteAlertRule. In the block that checks found (the branch
returning gen.UpdateAlertRule400JSONResponse), return the appropriate 404
response type (gen.UpdateAlertRule404JSONResponse) and set Title to
ptr(gen.NotFound) with Message ptr("alert rule not found") so the
UpdateAlertRule handler uses NotFound instead of BadRequest.
- Around line 173-175: The call to logs.FromLogsQueryResponseLogs0(entries) is
currently ignoring its returned error, which can hide serialization issues;
update the code in the handler where logs := gen.LogsQueryResponse_Logs{} is
created to capture the error (err := logs.FromLogsQueryResponseLogs0(entries)),
check it, and handle it appropriately (e.g., log the error via the existing
logger or return an error response) before assigning resp.Logs = &logs so
serialization failures are visible and propagated.
- Around line 270-272: The call to logs.FromLogsQueryResponseLogs1(entries)
ignores its returned error; update the block where gen.LogsQueryResponse_Logs is
constructed (variable logs, method FromLogsQueryResponseLogs1) to capture the
error, log or handle it (e.g., via processLogger/resp logger or return the
error) and only set resp.Logs = &logs on success; ensure the error path records
the failure with context so it isn't silently dropped.

In `@observability-logs-opensearch/internal/opensearch/client.go`:
- Around line 32-44: The OPENSEARCH TLS skip-verify default is currently true;
update the default to false in the configuration so certificate verification is
enabled by default. Specifically, change the default value for
OPENSEARCH_TLS_SKIP_VERIFY in observability-logs-opensearch/internal/config.go
from true to false and ensure any code using NewClient(insecureSkipVerify bool)
continues to pass the configured value (the TLS behavior in NewClient’s
TLSClientConfig.InsecureSkipVerify should reflect this flipped default). Also
update any related environment variable help text/comments to indicate
skip-verify is opt-in for dev/self-signed certs.

In `@observability-logs-opensearch/internal/opensearch/queries.go`:
- Around line 360-385: BuildLogAlertingRuleMonitorBody currently allows
sub-minute intervals because it uses intervalDuration.Minutes(); validate
params.Condition.Interval by checking the parsed duration is a positive whole
number of minutes (e.g., ensure intervalDuration > 0 and
intervalDuration%time.Minute == 0) before constructing
MonitorSchedulePeriod.Interval, and return a clear error if the provided
duration is fractional/minutes not whole (reject values like "30s" or "90s");
update the validation near the existing time.ParseDuration call and the
MonitorSchedule construction in BuildLogAlertingRuleMonitorBody to enforce
whole-minute intervals.
- Around line 186-248: BuildWorkflowRunLogsQuery currently lets an empty
WorkflowRunID and zero pagination/sort propagate to OpenSearch; update
BuildWorkflowRunLogsQuery (and use sanitizeWildcardValue,
WorkflowRunQueryParams, addTimeRangeFilter) to validate and guard those
zero-values: if params.WorkflowRunID is empty, produce a safe no-match query
(e.g., use a match_none query or a wildcard that cannot match) instead of using
"*" for pod name; if params.QueryParams.Limit is zero set a sane default limit
(e.g., 100) and if params.QueryParams.SortOrder is empty set a default sort
(e.g., "desc") so the built query mirrors the safe behavior in
BuildComponentLogsQueryV1. Ensure sanitizeWildcardValue is only applied when
WorkflowRunID is present and keep the rest of the filters (namespace, stepName,
time range, must_not) intact.

---

Nitpick comments:
In `@observability-logs-opensearch/internal/handlers.go`:
- Around line 500-509: The fire-and-forget goroutine that calls
h.observerClient.ForwardAlert uses context.Background() and can be terminated
during server shutdown; modify the code to track in-flight forwards using a
sync.WaitGroup (or a dedicated worker pool) stored on the handler (e.g., add a
field like wg sync.WaitGroup to the handler struct), increment the WG before
spawning the goroutine, defer wg.Done() inside it, and use a shutdown signaling
context (or the server’s shutdown sequence) to wait on wg.Wait() during graceful
termination so all ForwardAlert calls complete; ensure ForwardAlert still
receives a bounded context (e.g., context.WithTimeout) derived from the
handler’s shutdown-aware context.
- Around line 772-805: The extractWindowFromQuery function is deeply nested;
refactor by extracting small helpers to flatten logic: create
getInputSearch(monitor map[string]interface{}) (map[string]interface{}, bool) to
return the first input's "search" map, getQueryMap(search
map[string]interface{}) (map[string]interface{}, bool) to return the inner
"query" map, getBoolFilter(queryMap map[string]interface{}) ([]interface{},
bool) to return the "bool" -> "filter" slice, and getTimestampFromFilter(filter
map[string]interface{}) (string, bool) to extract the "@timestamp.from" string;
then rewrite extractWindowFromQuery to call these helpers in sequence and return
the parsed window substring, which will remove the deep nested type assertions
and improve readability.
- Around line 534-544: The switch on countVal in handlers.go only handles
float64 and string, causing integer JSON values to be ignored; update the switch
in the block that reads body["alertValue"] (the countVal handling that sets
alertValue) to also handle integer types (e.g., int, int32, int64, uint, uint64)
by converting them to float64 (e.g., via type assertion to the integer type then
casting to float64), and consider handling json.Number if used by your decoder
by parsing it to float64; keep the existing string parsing/error path intact.

In `@observability-logs-opensearch/Makefile`:
- Around line 3-13: The Makefile's SPEC variable currently points to a branch
head URL which is mutable; update SPEC to reference an immutable source (either
vendor the spec file into the repo and point SPEC to the local path, or replace
the URL with a raw file URL that includes a specific commit SHA) so that the
openapi-codegen target (targets: openapi-codegen and oapi-codegen-install)
always regenerates from a reproducible spec; ensure the new SPEC still works
with the existing oapi-codegen calls in the Makefile (and update any docs or CI
that expect the old URL).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e182dbab-b1d5-4c35-8ec3-e7e8481d3ba6

📥 Commits

Reviewing files that changed from the base of the PR and between f5f60e9 and fe6b169.

📒 Files selected for processing (10)
  • observability-logs-opensearch/Makefile
  • observability-logs-opensearch/internal/handlers.go
  • observability-logs-opensearch/internal/handlers_test.go
  • observability-logs-opensearch/internal/observer/client_test.go
  • observability-logs-opensearch/internal/opensearch/client.go
  • observability-logs-opensearch/internal/opensearch/client_test.go
  • observability-logs-opensearch/internal/opensearch/queries.go
  • observability-logs-opensearch/internal/opensearch/queries_test.go
  • observability-logs-opensearch/internal/opensearch/types_test.go
  • observability-logs-opensearch/main.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • observability-logs-opensearch/main.go

@nilushancosta nilushancosta force-pushed the logs-opensearch branch 2 times, most recently from ce1c1c7 to eb87a23 Compare April 3, 2026 09:19
Signed-off-by: Nilushan Costa <nilushan@wso2.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
observability-logs-opensearch/internal/opensearch/queries.go (2)

299-344: ⚠️ Potential issue | 🟠 Major

Persist namespace in the stored alert query filters.

params.Metadata.Namespace is available here, but it never gets written into filterConditions. parseMonitorToAlertRuleResponse in observability-logs-opensearch/internal/handlers.go reconstructs rule metadata from these saved filters, so GetAlertRule loses the namespace on round-trip unless it is stored here as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/opensearch/queries.go` around lines
299 - 344, filterConditions is missing the Namespace term so the alert rule
metadata round-trips loses Namespace; add a term entry similar to the
Component/Environment/Project terms using the same pattern: append a
map[string]interface{}{"term": map[string]interface{}{OSNamespaceID:
map[string]interface{}{"value": params.Metadata.Namespace, "boost": 1}}} to
filterConditions (update the slice near where filterConditions is constructed in
queries.go) so parseMonitorToAlertRuleResponse can reconstruct Namespace
correctly.

186-248: ⚠️ Potential issue | 🟠 Major

Still reject blank workflow run IDs and apply the workflow-only filters here.

WorkflowRunID == "" still turns into kubernetes.pod_name: "*", and SearchPhrase / LogLevels from QueryParams are never added to mustConditions. That means a malformed workflow request can fan out into a broad scan, and valid workflow queries silently ignore keyword/level filtering that internal/handlers.go already accepts and forwards.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/opensearch/queries.go` around lines
186 - 248, The BuildWorkflowRunLogsQuery currently builds a wildcard pod_name
even when params.WorkflowRunID is blank and never applies
QueryParams.SearchPhrase or QueryParams.LogLevels; fix by early-rejecting empty
WorkflowRunID (e.g., return nil/empty query) at the top of
BuildWorkflowRunLogsQuery if params.WorkflowRunID == "", then only build
podNamePattern and add the pod_name wildcard when non-empty (using
sanitizeWildcardValue as before), and append SearchPhrase and LogLevels filters
from params.QueryParams (use a match_phrase or query_string for
QueryParams.SearchPhrase and a terms/term filter for QueryParams.LogLevels) into
mustConditions before calling addTimeRangeFilter so keyword/level filtering is
applied for workflow-only queries.
🧹 Nitpick comments (1)
observability-logs-opensearch/Makefile (1)

3-13: Pin the OpenAPI spec instead of generating from a moving main branch.

SPEC currently tracks refs/heads/main, so every make openapi-codegen run depends on whatever is in the docs repo at that moment. That makes regeneration non-reproducible and can change this module’s generated API surface without any matching change in this repo. Please vendor the spec or pin the URL to an immutable tag/commit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/Makefile` around lines 3 - 13, The SPEC
variable currently points to a moving ref (refs/heads/main) causing
non-reproducible codegen; update SPEC to reference an immutable location (a
vendored local file or a commit/tag URL) and adjust the openapi-codegen target
accordingly so ./$(CFG_DIR) uses the pinned spec; ensure oapi-codegen-install
and targets (oapi-codegen-install, openapi-codegen) continue to work with the
new SPEC value and commit the vendored spec file if you choose to vendor it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@observability-logs-opensearch/internal/handlers_test.go`:
- Around line 426-439: Update the assertion in TestDeleteAlertRule_NotFound to
expect the new 404 response type: after calling handler.DeleteAlertRule (the
DeleteAlertRule method on the LogsHandler instance created as handler), replace
the check that currently expects gen.DeleteAlertRule500JSONResponse with a check
for gen.DeleteAlertRule404JSONResponse so the test verifies the new not-found
behavior for missing monitors.

In `@observability-logs-opensearch/internal/handlers.go`:
- Around line 455-459: The handler returns a non-existent response type
gen.UpdateAlertRule404JSONResponse causing a build error; either add a 404
response for the UpdateAlertRule operation to your OpenAPI spec and re-run code
generation to populate internal/api/gen, or change the return to one of the
actual generated response types in the gen package (replace
gen.UpdateAlertRule404JSONResponse in internal/handlers.go with an existing
generated response type for UpdateAlertRule, e.g., the operation's default or
NotFound type) so the package compiles.

In `@observability-logs-opensearch/internal/opensearch/client.go`:
- Around line 172-178: The code currently returns parsed.Hits.Hits[0].ID when
one or more hits are found, which silently chooses the first match; modify the
function that returns (string, bool, error) to detect ambiguous matches by
checking if parsed.Hits.Total.Value > 1 or len(parsed.Hits.Hits) > 1 and, in
that case, return "", false, fmt.Errorf("ambiguous monitor name: %d matches
found", parsed.Hits.Total.Value) (or similar), leaving the existing zero-hit
check and the missing _id check intact so callers must handle ambiguous results
instead of acting on the first hit.

In `@observability-logs-opensearch/internal/opensearch/queries.go`:
- Around line 37-49: formatDurationForOpenSearch currently allows zero or
negative durations which leads to malformed ranges in BuildLogAlertingRuleQuery;
update formatDurationForOpenSearch to parse the duration (time.ParseDuration),
then return an error if parsed <= 0 (reject zero and negative values) with a
clear message (e.g., "duration must be a positive whole number of minutes or
hours"), keeping the existing checks for whole minutes/hours and returning
formatted strings when valid so BuildLogAlertingRuleQuery never receives 0m or
negative inputs.

In `@observability-logs-opensearch/main.go`:
- Around line 83-98: When the server start error arrives on errCh the select
branch only logs it and continues normal shutdown, resulting in a zero exit;
change the select handling for case err := <-errCh in the main goroutine to
ensure a non-zero exit path: after logging the error from errCh (use
logger.Error with slog.Any("error", err)), either call os.Exit(1) immediately or
set a flag/return value that causes the program to exit with a non-zero status
after running the graceful shutdown (ensure srv.Shutdown still runs if you want
cleanup but propagate the original start error by exiting with status 1). Update
the select branch that reads from errCh (and any code that follows) so startup
failures from srv.Start trigger os.Exit(1) rather than allowing a 0 exit.

---

Duplicate comments:
In `@observability-logs-opensearch/internal/opensearch/queries.go`:
- Around line 299-344: filterConditions is missing the Namespace term so the
alert rule metadata round-trips loses Namespace; add a term entry similar to the
Component/Environment/Project terms using the same pattern: append a
map[string]interface{}{"term": map[string]interface{}{OSNamespaceID:
map[string]interface{}{"value": params.Metadata.Namespace, "boost": 1}}} to
filterConditions (update the slice near where filterConditions is constructed in
queries.go) so parseMonitorToAlertRuleResponse can reconstruct Namespace
correctly.
- Around line 186-248: The BuildWorkflowRunLogsQuery currently builds a wildcard
pod_name even when params.WorkflowRunID is blank and never applies
QueryParams.SearchPhrase or QueryParams.LogLevels; fix by early-rejecting empty
WorkflowRunID (e.g., return nil/empty query) at the top of
BuildWorkflowRunLogsQuery if params.WorkflowRunID == "", then only build
podNamePattern and add the pod_name wildcard when non-empty (using
sanitizeWildcardValue as before), and append SearchPhrase and LogLevels filters
from params.QueryParams (use a match_phrase or query_string for
QueryParams.SearchPhrase and a terms/term filter for QueryParams.LogLevels) into
mustConditions before calling addTimeRangeFilter so keyword/level filtering is
applied for workflow-only queries.

---

Nitpick comments:
In `@observability-logs-opensearch/Makefile`:
- Around line 3-13: The SPEC variable currently points to a moving ref
(refs/heads/main) causing non-reproducible codegen; update SPEC to reference an
immutable location (a vendored local file or a commit/tag URL) and adjust the
openapi-codegen target accordingly so ./$(CFG_DIR) uses the pinned spec; ensure
oapi-codegen-install and targets (oapi-codegen-install, openapi-codegen)
continue to work with the new SPEC value and commit the vendored spec file if
you choose to vendor it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4de7645a-3955-4ad1-829e-ea7fcc5e8a4b

📥 Commits

Reviewing files that changed from the base of the PR and between fe6b169 and ce1c1c7.

📒 Files selected for processing (11)
  • observability-logs-opensearch/Makefile
  • observability-logs-opensearch/internal/handlers.go
  • observability-logs-opensearch/internal/handlers_test.go
  • observability-logs-opensearch/internal/observer/client_test.go
  • observability-logs-opensearch/internal/opensearch/client.go
  • observability-logs-opensearch/internal/opensearch/client_test.go
  • observability-logs-opensearch/internal/opensearch/queries.go
  • observability-logs-opensearch/internal/opensearch/queries_test.go
  • observability-logs-opensearch/internal/opensearch/types.go
  • observability-logs-opensearch/internal/opensearch/types_test.go
  • observability-logs-opensearch/main.go
✅ Files skipped from review due to trivial changes (2)
  • observability-logs-opensearch/internal/opensearch/types_test.go
  • observability-logs-opensearch/internal/opensearch/client_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • observability-logs-opensearch/internal/observer/client_test.go

Comment on lines +172 to +178
if parsed.Hits.Total.Value == 0 || len(parsed.Hits.Hits) == 0 {
return "", false, nil
}
if parsed.Hits.Hits[0].ID == "" {
return "", false, fmt.Errorf("monitor search response missing _id field")
}
return parsed.Hits.Hits[0].ID, true, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't silently choose the first monitor hit.

When more than one monitor matches name, this method returns parsed.Hits.Hits[0].ID and the alert-rule handlers can operate on the wrong resource. Please fail on ambiguous matches here, or switch the lookup to a unique scoped identifier, instead of treating the first hit as authoritative.

Suggested guard
 	if parsed.Hits.Total.Value == 0 || len(parsed.Hits.Hits) == 0 {
 		return "", false, nil
 	}
+	if parsed.Hits.Total.Value > 1 {
+		return "", false, fmt.Errorf("multiple monitors found for name %q", name)
+	}
 	if parsed.Hits.Hits[0].ID == "" {
 		return "", false, fmt.Errorf("monitor search response missing _id field")
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if parsed.Hits.Total.Value == 0 || len(parsed.Hits.Hits) == 0 {
return "", false, nil
}
if parsed.Hits.Hits[0].ID == "" {
return "", false, fmt.Errorf("monitor search response missing _id field")
}
return parsed.Hits.Hits[0].ID, true, nil
if parsed.Hits.Total.Value == 0 || len(parsed.Hits.Hits) == 0 {
return "", false, nil
}
if parsed.Hits.Total.Value > 1 {
return "", false, fmt.Errorf("multiple monitors found for name %q", name)
}
if parsed.Hits.Hits[0].ID == "" {
return "", false, fmt.Errorf("monitor search response missing _id field")
}
return parsed.Hits.Hits[0].ID, true, nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/opensearch/client.go` around lines 172
- 178, The code currently returns parsed.Hits.Hits[0].ID when one or more hits
are found, which silently chooses the first match; modify the function that
returns (string, bool, error) to detect ambiguous matches by checking if
parsed.Hits.Total.Value > 1 or len(parsed.Hits.Hits) > 1 and, in that case,
return "", false, fmt.Errorf("ambiguous monitor name: %d matches found",
parsed.Hits.Total.Value) (or similar), leaving the existing zero-hit check and
the missing _id check intact so callers must handle ambiguous results instead of
acting on the first hit.

Comment on lines +37 to +49
func formatDurationForOpenSearch(d string) (string, error) {
parsed, err := time.ParseDuration(d)
if err != nil {
return "", err
}

switch {
case parsed%time.Hour == 0:
return fmt.Sprintf("%dh", parsed/time.Hour), nil
case parsed%time.Minute == 0:
return fmt.Sprintf("%dm", parsed/time.Minute), nil
}
return "", fmt.Errorf("duration must be a whole number of minutes or hours; seconds are not supported: %s", d)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject zero and negative alert windows.

formatDurationForOpenSearch currently accepts values like 0m and -5m. BuildLogAlertingRuleQuery then emits ranges like {{period_end}}||-0h or {{period_end}}||--5m, which produces nonsensical monitor queries instead of a validation error.

Suggested guard
 func formatDurationForOpenSearch(d string) (string, error) {
 	parsed, err := time.ParseDuration(d)
 	if err != nil {
 		return "", err
 	}
+	if parsed <= 0 {
+		return "", fmt.Errorf("duration must be greater than zero: %s", d)
+	}
 
 	switch {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/opensearch/queries.go` around lines 37
- 49, formatDurationForOpenSearch currently allows zero or negative durations
which leads to malformed ranges in BuildLogAlertingRuleQuery; update
formatDurationForOpenSearch to parse the duration (time.ParseDuration), then
return an error if parsed <= 0 (reject zero and negative values) with a clear
message (e.g., "duration must be a positive whole number of minutes or hours"),
keeping the existing checks for whole minutes/hours and returning formatted
strings when valid so BuildLogAlertingRuleQuery never receives 0m or negative
inputs.

Comment on lines +83 to +98
select {
case err := <-errCh:
logger.Error("Server error", slog.Any("error", err))
case <-quit:
logger.Info("Shutting down gracefully")
}

shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 10*time.Second)
defer shutdownCancel()

if err := srv.Shutdown(shutdownCtx); err != nil {
logger.Error("Error during shutdown", slog.Any("error", err))
os.Exit(1)
}

logger.Info("Server stopped")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Startup failures currently exit successfully.

When srv.Start() sends an error on errCh, the code logs it but then falls through the normal shutdown path and can still return exit code 0. A bind/listen failure should propagate a non-zero exit status so supervisors do not treat the process as a clean stop.

Suggested fix
+	startErr := false
 	select {
 	case err := <-errCh:
 		logger.Error("Server error", slog.Any("error", err))
+		startErr = true
 	case <-quit:
 		logger.Info("Shutting down gracefully")
 	}
@@
 	if err := srv.Shutdown(shutdownCtx); err != nil {
 		logger.Error("Error during shutdown", slog.Any("error", err))
 		os.Exit(1)
 	}
+	if startErr {
+		os.Exit(1)
+	}
 
 	logger.Info("Server stopped")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
select {
case err := <-errCh:
logger.Error("Server error", slog.Any("error", err))
case <-quit:
logger.Info("Shutting down gracefully")
}
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 10*time.Second)
defer shutdownCancel()
if err := srv.Shutdown(shutdownCtx); err != nil {
logger.Error("Error during shutdown", slog.Any("error", err))
os.Exit(1)
}
logger.Info("Server stopped")
startErr := false
select {
case err := <-errCh:
logger.Error("Server error", slog.Any("error", err))
startErr = true
case <-quit:
logger.Info("Shutting down gracefully")
}
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 10*time.Second)
defer shutdownCancel()
if err := srv.Shutdown(shutdownCtx); err != nil {
logger.Error("Error during shutdown", slog.Any("error", err))
os.Exit(1)
}
if startErr {
os.Exit(1)
}
logger.Info("Server stopped")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/main.go` around lines 83 - 98, When the server
start error arrives on errCh the select branch only logs it and continues normal
shutdown, resulting in a zero exit; change the select handling for case err :=
<-errCh in the main goroutine to ensure a non-zero exit path: after logging the
error from errCh (use logger.Error with slog.Any("error", err)), either call
os.Exit(1) immediately or set a flag/return value that causes the program to
exit with a non-zero status after running the graceful shutdown (ensure
srv.Shutdown still runs if you want cleanup but propagate the original start
error by exiting with status 1). Update the select branch that reads from errCh
(and any code that follows) so startup failures from srv.Start trigger
os.Exit(1) rather than allowing a 0 exit.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (5)
observability-logs-opensearch/main.go (1)

83-98: ⚠️ Potential issue | 🟠 Major

Exit non-zero when the server fails to start.

When srv.Start() reports an error on errCh, main logs it but then follows the normal shutdown path and returns 0. Supervisors will treat bind/listen failures as clean exits.

🐛 Suggested fix
+	startErr := false
 	select {
 	case err := <-errCh:
 		logger.Error("Server error", slog.Any("error", err))
+		startErr = true
 	case <-quit:
 		logger.Info("Shutting down gracefully")
 	}
@@
 	if err := srv.Shutdown(shutdownCtx); err != nil {
 		logger.Error("Error during shutdown", slog.Any("error", err))
 		os.Exit(1)
 	}
+	if startErr {
+		os.Exit(1)
+	}
 
 	logger.Info("Server stopped")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/main.go` around lines 83 - 98, The select
handling startup errors should cause a non-zero exit when srv fails to start: in
the select case that receives err from errCh (the channel used by srv.Start()),
after logging via logger.Error("Server error", slog.Any("error", err")) return
or call os.Exit(1) instead of proceeding to the normal shutdown path; ensure you
still run any necessary cleanup (e.g., cancel contexts) before exiting and avoid
falling through to the quit branch so supervisors see a non-zero exit on
bind/listen failures. Use the existing errCh, quit, srv.Shutdown and logger
symbols to locate and implement the change.
observability-logs-opensearch/internal/opensearch/queries.go (2)

37-49: ⚠️ Potential issue | 🟠 Major

Reject zero and negative alert windows before formatting them.

0m and negative durations still satisfy the whole-minute/hour branches here, so BuildLogAlertingRuleQuery can emit ranges like {{period_end}}||-0h or {{period_end}}||--5m.

🐛 Suggested fix
 func formatDurationForOpenSearch(d string) (string, error) {
 	parsed, err := time.ParseDuration(d)
 	if err != nil {
 		return "", err
 	}
+	if parsed <= 0 {
+		return "", fmt.Errorf("duration must be greater than zero: %s", d)
+	}
 
 	switch {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/opensearch/queries.go` around lines 37
- 49, The function formatDurationForOpenSearch currently allows zero and
negative durations because it only checks divisibility; update
formatDurationForOpenSearch to reject parsed durations <= 0 before the switch by
returning an error (e.g., "duration must be a positive whole number of minutes
or hours") so callers like BuildLogAlertingRuleQuery cannot generate "-0h" or
"--5m"; keep the existing hour/minute branch logic and error behavior for
non-whole-minute/hour values.

186-248: ⚠️ Potential issue | 🟠 Major

Workflow log queries can broaden unexpectedly and ignore requested filters.

When WorkflowRunID is empty, this builds kubernetes.pod_name: "*". It also passes through zero Limit/empty SortOrder, and unlike BuildComponentLogsQueryV1 it never applies QueryParams.SearchPhrase or QueryParams.LogLevels even though observability-logs-opensearch/internal/handlers.go Lines 206-215 populate those fields before calling this builder.

🐛 Suggested direction
-func (qb *QueryBuilder) BuildWorkflowRunLogsQuery(params WorkflowRunQueryParams) map[string]interface{} {
+func (qb *QueryBuilder) BuildWorkflowRunLogsQuery(params WorkflowRunQueryParams) (map[string]interface{}, error) {
+	if strings.TrimSpace(params.WorkflowRunID) == "" {
+		return nil, fmt.Errorf("workflow run id is required")
+	}
+
+	limit := params.QueryParams.Limit
+	if limit <= 0 {
+		limit = 100
+	}
+
+	sortOrder := params.QueryParams.SortOrder
+	if sortOrder == "" {
+		sortOrder = "desc"
+	}
+
 	sanitizedWorkflowRunID := sanitizeWildcardValue(params.WorkflowRunID)
 	podNamePattern := sanitizedWorkflowRunID + "*"
@@
 	}
+	mustConditions = addSearchPhraseFilter(mustConditions, params.QueryParams.SearchPhrase)
+	mustConditions = addLogLevelFilter(mustConditions, params.QueryParams.LogLevels)
@@
-		"size": params.QueryParams.Limit,
+		"size": limit,
@@
-					"order": params.QueryParams.SortOrder,
+					"order": sortOrder,
@@
-	return query
+	return query, nil
 }
The call site at `observability-logs-opensearch/internal/handlers.go` Line 227 would then need to handle the returned error.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/opensearch/queries.go` around lines
186 - 248, BuildWorkflowRunLogsQuery currently constructs a broad wildcard when
WorkflowRunID is empty and ignores QueryParams.SearchPhrase,
QueryParams.LogLevels and zero/empty Limit/SortOrder; change
BuildWorkflowRunLogsQuery (and its caller) to return (map[string]interface{},
error) and return an error when params.WorkflowRunID is empty (avoid building
kubernetes.pod_name:"*"); enforce sane defaults for params.QueryParams.Limit and
params.QueryParams.SortOrder when zero/empty (matching behavior in
BuildComponentLogsQueryV1), and incorporate QueryParams.SearchPhrase and
QueryParams.LogLevels into the bool.must (and/or must_not) clauses the same way
BuildComponentLogsQueryV1 does; keep use of sanitizeWildcardValue,
addTimeRangeFilter and the existing namespace/container filters, and update the
call site to handle the returned error.
observability-logs-opensearch/internal/opensearch/client.go (1)

172-178: ⚠️ Potential issue | 🟠 Major

Don't silently pick the first matching monitor.

The alert-rule handlers use this ID for get/update/delete. If two monitors share a name, parsed.Hits.Hits[0].ID makes those operations target an arbitrary rule instead of failing on ambiguity.

🐛 Suggested guard
 	if parsed.Hits.Total.Value == 0 || len(parsed.Hits.Hits) == 0 {
 		return "", false, nil
 	}
+	if parsed.Hits.Total.Value > 1 || len(parsed.Hits.Hits) > 1 {
+		return "", false, fmt.Errorf("ambiguous monitor name %q", name)
+	}
 	if parsed.Hits.Hits[0].ID == "" {
 		return "", false, fmt.Errorf("monitor search response missing _id field")
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/opensearch/client.go` around lines 172
- 178, The current code silently returns parsed.Hits.Hits[0].ID which can target
an arbitrary monitor when multiple monitors match; update the logic in the
search handling (the block that inspects parsed.Hits.Total.Value and
parsed.Hits.Hits) to detect ambiguous results and return an explicit error
instead of the first hit when more than one match is found (e.g., when
parsed.Hits.Total.Value > 1 or len(parsed.Hits.Hits) > 1), including a clear
message that the monitor name is ambiguous and the match count; keep the
existing nil/empty handling for zero hits and preserve the existing check for a
missing ID on the single-hit path.
observability-logs-opensearch/Makefile (1)

3-13: ⚠️ Potential issue | 🟠 Major

Pin the OpenAPI spec revision used for codegen.

SPEC tracks openchoreo.github.io's moving main branch, so rerunning openapi-codegen later can regenerate different server/client code without any change in this module. That makes local builds and CI non-reproducible; pin a commit/tag or vendor the spec instead.

♻️ Suggested direction
-SPEC := https://raw.githubusercontent.com/openchoreo/openchoreo.github.io/refs/heads/main/static/api-specs/observability-logs-adapter-api.yaml
+SPEC_REF ?= <pinned-commit-or-tag>
+SPEC := https://raw.githubusercontent.com/openchoreo/openchoreo.github.io/$(SPEC_REF)/static/api-specs/observability-logs-adapter-api.yaml
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/Makefile` around lines 3 - 13, The SPEC
variable currently points to the moving main branch URL which makes
openapi-codegen non-reproducible; update the Makefile so SPEC references a fixed
revision (commit SHA or tag) or replace SPEC with a local vendored file and
update the openapi-codegen target accordingly (affecting SPEC and the
openapi-codegen recipe that calls oapi-codegen) so repeated runs generate
stable, reproducible client/server code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@observability-logs-opensearch/internal/handlers.go`:
- Around line 306-315: The CreateAlertRule and UpdateAlertRule handlers
currently treat all errors from queryBuilder.BuildLogAlertingRuleMonitorBody as
500s; change both handlers to detect validation errors returned by
BuildLogAlertingRuleMonitorBody (use errors.Is/errors.As against the
queryBuilder's validation error type/name, e.g., QueryBuilder.ValidationError or
queryBuilder.ErrValidation/ErrInvalidInput) and when a validation error is found
return gen.CreateAlertRule400JSONResponse / gen.UpdateAlertRule400JSONResponse
with Title ptr(gen.BadRequest) and Message set to the validation error message;
keep other errors mapping to the existing 500 response.
- Around line 462-495: The handler currently allows the incoming body to rename
the monitor by using request.Body.Metadata.Name while the response and
subsequent lookups use request.RuleName; fix this by enforcing consistency:
before calling h.queryBuilder.BuildLogAlertingRuleMonitorBody (or before passing
params), either validate that params.Metadata.Name equals request.RuleName and
return a 400 if they differ, or overwrite params.Metadata.Name =
request.RuleName so the built monitorBody cannot rename the resource; update the
logic around toAlertingRuleRequest / BuildLogAlertingRuleMonitorBody and the
monitor update path (UpdateMonitor) to use the canonical request.RuleName.

In `@observability-logs-opensearch/Makefile`:
- Around line 7-13: The Makefile hardcodes the oapi-codegen binary path to
$(shell go env GOPATH)/bin which breaks when GOBIN is set; update the
openapi-codegen target to compute a binary path that prefers GOBIN then falls
back to GOPATH/bin (e.g., define a local variable like OAPI_BIN that uses
$(shell go env GOBIN) if non-empty else $(shell go env GOPATH)/bin) and invoke
$(OAPI_BIN)/oapi-codegen for the three codegen calls; keep existing vars
OAPI_CODEGEN_VERSION, CFG_DIR and SPEC unchanged.

---

Duplicate comments:
In `@observability-logs-opensearch/internal/opensearch/client.go`:
- Around line 172-178: The current code silently returns parsed.Hits.Hits[0].ID
which can target an arbitrary monitor when multiple monitors match; update the
logic in the search handling (the block that inspects parsed.Hits.Total.Value
and parsed.Hits.Hits) to detect ambiguous results and return an explicit error
instead of the first hit when more than one match is found (e.g., when
parsed.Hits.Total.Value > 1 or len(parsed.Hits.Hits) > 1), including a clear
message that the monitor name is ambiguous and the match count; keep the
existing nil/empty handling for zero hits and preserve the existing check for a
missing ID on the single-hit path.

In `@observability-logs-opensearch/internal/opensearch/queries.go`:
- Around line 37-49: The function formatDurationForOpenSearch currently allows
zero and negative durations because it only checks divisibility; update
formatDurationForOpenSearch to reject parsed durations <= 0 before the switch by
returning an error (e.g., "duration must be a positive whole number of minutes
or hours") so callers like BuildLogAlertingRuleQuery cannot generate "-0h" or
"--5m"; keep the existing hour/minute branch logic and error behavior for
non-whole-minute/hour values.
- Around line 186-248: BuildWorkflowRunLogsQuery currently constructs a broad
wildcard when WorkflowRunID is empty and ignores QueryParams.SearchPhrase,
QueryParams.LogLevels and zero/empty Limit/SortOrder; change
BuildWorkflowRunLogsQuery (and its caller) to return (map[string]interface{},
error) and return an error when params.WorkflowRunID is empty (avoid building
kubernetes.pod_name:"*"); enforce sane defaults for params.QueryParams.Limit and
params.QueryParams.SortOrder when zero/empty (matching behavior in
BuildComponentLogsQueryV1), and incorporate QueryParams.SearchPhrase and
QueryParams.LogLevels into the bool.must (and/or must_not) clauses the same way
BuildComponentLogsQueryV1 does; keep use of sanitizeWildcardValue,
addTimeRangeFilter and the existing namespace/container filters, and update the
call site to handle the returned error.

In `@observability-logs-opensearch/main.go`:
- Around line 83-98: The select handling startup errors should cause a non-zero
exit when srv fails to start: in the select case that receives err from errCh
(the channel used by srv.Start()), after logging via logger.Error("Server
error", slog.Any("error", err")) return or call os.Exit(1) instead of proceeding
to the normal shutdown path; ensure you still run any necessary cleanup (e.g.,
cancel contexts) before exiting and avoid falling through to the quit branch so
supervisors see a non-zero exit on bind/listen failures. Use the existing errCh,
quit, srv.Shutdown and logger symbols to locate and implement the change.

In `@observability-logs-opensearch/Makefile`:
- Around line 3-13: The SPEC variable currently points to the moving main branch
URL which makes openapi-codegen non-reproducible; update the Makefile so SPEC
references a fixed revision (commit SHA or tag) or replace SPEC with a local
vendored file and update the openapi-codegen target accordingly (affecting SPEC
and the openapi-codegen recipe that calls oapi-codegen) so repeated runs
generate stable, reproducible client/server code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 73b91b4f-8ff6-4f5a-9a97-b6d2a7513e5e

📥 Commits

Reviewing files that changed from the base of the PR and between ce1c1c7 and b705362.

⛔ Files ignored due to path filters (1)
  • observability-logs-opensearch/internal/api/gen/server.gen.go is excluded by !**/gen/**
📒 Files selected for processing (11)
  • observability-logs-opensearch/Makefile
  • observability-logs-opensearch/internal/handlers.go
  • observability-logs-opensearch/internal/handlers_test.go
  • observability-logs-opensearch/internal/observer/client_test.go
  • observability-logs-opensearch/internal/opensearch/client.go
  • observability-logs-opensearch/internal/opensearch/client_test.go
  • observability-logs-opensearch/internal/opensearch/queries.go
  • observability-logs-opensearch/internal/opensearch/queries_test.go
  • observability-logs-opensearch/internal/opensearch/types.go
  • observability-logs-opensearch/internal/opensearch/types_test.go
  • observability-logs-opensearch/main.go
✅ Files skipped from review due to trivial changes (5)
  • observability-logs-opensearch/internal/observer/client_test.go
  • observability-logs-opensearch/internal/opensearch/types_test.go
  • observability-logs-opensearch/internal/opensearch/queries_test.go
  • observability-logs-opensearch/internal/opensearch/client_test.go
  • observability-logs-opensearch/internal/opensearch/types.go

Comment on lines +306 to +315
monitorBody, err := h.queryBuilder.BuildLogAlertingRuleMonitorBody(params)
if err != nil {
h.logger.Error("Failed to build monitor body",
slog.String("function", "CreateAlertRule"),
slog.Any("error", err),
)
return gen.CreateAlertRule500JSONResponse{
Title: ptr(gen.InternalServerError),
Message: ptr("internal server error"),
}, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Surface request-validation failures as 400s.

BuildLogAlertingRuleMonitorBody already rejects malformed client input such as invalid interval, window, or operator values. Returning 500 Internal Server Error from both create and update makes bad requests indistinguishable from backend failures and gives callers no actionable response.

Also applies to: 464-473

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/handlers.go` around lines 306 - 315,
The CreateAlertRule and UpdateAlertRule handlers currently treat all errors from
queryBuilder.BuildLogAlertingRuleMonitorBody as 500s; change both handlers to
detect validation errors returned by BuildLogAlertingRuleMonitorBody (use
errors.Is/errors.As against the queryBuilder's validation error type/name, e.g.,
QueryBuilder.ValidationError or queryBuilder.ErrValidation/ErrInvalidInput) and
when a validation error is found return gen.CreateAlertRule400JSONResponse /
gen.UpdateAlertRule400JSONResponse with Title ptr(gen.BadRequest) and Message
set to the validation error message; keep other errors mapping to the existing
500 response.

Comment on lines +462 to +495
params := toAlertingRuleRequest(request.Body)

monitorBody, err := h.queryBuilder.BuildLogAlertingRuleMonitorBody(params)
if err != nil {
h.logger.Error("Failed to build monitor body",
slog.String("function", "UpdateAlertRule"),
slog.Any("error", err),
)
return gen.UpdateAlertRule500JSONResponse{
Title: ptr(gen.InternalServerError),
Message: ptr("internal server error"),
}, nil
}

if _, err := h.osClient.UpdateMonitor(ctx, monitorID, monitorBody); err != nil {
h.logger.Error("Failed to update monitor",
slog.String("function", "UpdateAlertRule"),
slog.String("ruleName", request.RuleName),
slog.Any("error", err),
)
return gen.UpdateAlertRule500JSONResponse{
Title: ptr(gen.InternalServerError),
Message: ptr("internal server error"),
}, nil
}

now := time.Now().UTC().Format(time.RFC3339)
return gen.UpdateAlertRule200JSONResponse{
Action: ptr(gen.Updated),
Status: ptr(gen.Synced),
RuleLogicalId: &request.RuleName,
RuleBackendId: &monitorID,
LastSyncedAt: &now,
}, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't let the request body rename the resource addressed by {ruleName}.

This handler resolves the monitor using request.RuleName, but the replacement body uses request.Body.Metadata.Name. If those differ, the call can rename the monitor while the response still reports the old logical ID, and subsequent get/delete requests by the old name stop working.

🐛 Suggested guard
 	params := toAlertingRuleRequest(request.Body)
+	if params.Metadata.Name != request.RuleName {
+		return gen.UpdateAlertRule400JSONResponse{
+			Title:   ptr(gen.BadRequest),
+			Message: ptr("metadata.name must match ruleName"),
+		}, nil
+	}
 
 	monitorBody, err := h.queryBuilder.BuildLogAlertingRuleMonitorBody(params)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/internal/handlers.go` around lines 462 - 495,
The handler currently allows the incoming body to rename the monitor by using
request.Body.Metadata.Name while the response and subsequent lookups use
request.RuleName; fix this by enforcing consistency: before calling
h.queryBuilder.BuildLogAlertingRuleMonitorBody (or before passing params),
either validate that params.Metadata.Name equals request.RuleName and return a
400 if they differ, or overwrite params.Metadata.Name = request.RuleName so the
built monitorBody cannot rename the resource; update the logic around
toAlertingRuleRequest / BuildLogAlertingRuleMonitorBody and the monitor update
path (UpdateMonitor) to use the canonical request.RuleName.

Comment on lines +7 to +13
oapi-codegen-install:
go install github.com/oapi-codegen/oapi-codegen/v2/cmd/oapi-codegen@$(OAPI_CODEGEN_VERSION)

openapi-codegen: oapi-codegen-install
cd $(CFG_DIR) && $(shell go env GOPATH)/bin/oapi-codegen --config cfg-models.yaml $(SPEC)
cd $(CFG_DIR) && $(shell go env GOPATH)/bin/oapi-codegen --config cfg-server.yaml $(SPEC)
cd $(CFG_DIR) && $(shell go env GOPATH)/bin/oapi-codegen --config cfg-client.yaml $(SPEC)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the Makefile
fd -t f "Makefile" observability-logs-opensearch/

Repository: openchoreo/community-modules

Length of output: 110


🏁 Script executed:

cat observability-logs-opensearch/Makefile

Repository: openchoreo/community-modules

Length of output: 924


🌐 Web query:

Go GOBIN environment variable go install behavior

💡 Result:

The GOBIN environment variable specifies the directory where the 'go install' command installs executable commands (main packages). If GOBIN is set to an absolute path, 'go install' installs executables there. If GOBIN is not set, it defaults to $GOPATH/bin or $HOME/go/bin if GOPATH is not set. Exceptions: Executables in $GOROOT (standard library tools) are installed to $GOROOT/bin or $GOTOOLDIR instead of GOBIN. Cross-compiled binaries go to $GOOS_$GOARCH subdirectories of the install dir. Special case: 'go install' on a single .go file requires GOBIN to be set; it does not fall back to the default. You can set GOBIN persistently with 'go env -w GOBIN=/path/to/bin'. The effective value (including default) can be queried indirectly via 'go list -f {{.Target}} ./...' or 'go env GOPATH' + '/bin', but 'go env GOBIN' shows only the explicit setting (empty if unset). Add the install directory to PATH to run installed binaries conveniently.

Citations:


🏁 Script executed:

# Verify Makefile $(or) syntax is valid by checking if this is standard Makefile syntax
cat > /tmp/test.mk << 'EOF'
TEST_VAR := $(or $(shell echo ""),fallback)
all:
	`@echo` "$(TEST_VAR)"
EOF

make -f /tmp/test.mk

Repository: openchoreo/community-modules

Length of output: 116


Don't hardcode oapi-codegen to GOPATH/bin.

go install ...@version installs to GOBIN when that is set, but this target always executes $(shell go env GOPATH)/bin/oapi-codegen. In environments with a custom GOBIN, the install step succeeds and codegen still fails with "file not found".

🛠️ Suggested fix
+OAPI_CODEGEN_BIN := $(or $(shell go env GOBIN),$(shell go env GOPATH)/bin)/oapi-codegen
+
 oapi-codegen-install:
 	go install github.com/oapi-codegen/oapi-codegen/v2/cmd/oapi-codegen@$(OAPI_CODEGEN_VERSION)
 
 openapi-codegen: oapi-codegen-install
-	cd $(CFG_DIR) && $(shell go env GOPATH)/bin/oapi-codegen --config cfg-models.yaml $(SPEC)
-	cd $(CFG_DIR) && $(shell go env GOPATH)/bin/oapi-codegen --config cfg-server.yaml $(SPEC)
-	cd $(CFG_DIR) && $(shell go env GOPATH)/bin/oapi-codegen --config cfg-client.yaml $(SPEC)
+	cd $(CFG_DIR) && $(OAPI_CODEGEN_BIN) --config cfg-models.yaml $(SPEC)
+	cd $(CFG_DIR) && $(OAPI_CODEGEN_BIN) --config cfg-server.yaml $(SPEC)
+	cd $(CFG_DIR) && $(OAPI_CODEGEN_BIN) --config cfg-client.yaml $(SPEC)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@observability-logs-opensearch/Makefile` around lines 7 - 13, The Makefile
hardcodes the oapi-codegen binary path to $(shell go env GOPATH)/bin which
breaks when GOBIN is set; update the openapi-codegen target to compute a binary
path that prefers GOBIN then falls back to GOPATH/bin (e.g., define a local
variable like OAPI_BIN that uses $(shell go env GOBIN) if non-empty else $(shell
go env GOPATH)/bin) and invoke $(OAPI_BIN)/oapi-codegen for the three codegen
calls; keep existing vars OAPI_CODEGEN_VERSION, CFG_DIR and SPEC unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants