feat: add OpenSearch based logs adapter#70
feat: add OpenSearch based logs adapter#70nilushancosta wants to merge 2 commits intoopenchoreo:mainfrom
Conversation
Signed-off-by: Nilushan Costa <nilushan@wso2.com>
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
📝 WalkthroughWalkthroughAdds a new OpenSearch logs adapter: container build, Go module and Makefile, Helm chart and templates, environment-driven configuration, OpenSearch client and query builder, HTTP handlers for logs/alert rules/webhook, observer forwarding client, server lifecycle, and extensive unit tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Main
participant Config
participant Logger
participant OSClient as OpenSearch Client
participant Observer as Observer Client
participant QB as QueryBuilder
participant Handler
participant Server
Main->>Config: LoadConfig()
Config-->>Main: Config
Main->>Logger: init JSON logger
Main->>OSClient: NewClient(...) / CheckHealth(ctx)
OSClient-->>Main: healthy
Main->>Observer: NewClient(observerUrl)
Main->>QB: NewQueryBuilder(indexPrefix)
Main->>Handler: NewLogsHandler(osClient, qb, observer, logger)
Main->>Server: NewServer(port, handler, logger)
Main->>Server: Start()
sequenceDiagram
participant Client
participant Handler
participant QB as QueryBuilder
participant OSClient as OpenSearch Client
participant Observer as Observer Client
Client->>Handler: QueryLogs(request)
Handler->>QB: Build query & indices
QB-->>Handler: query body, indices
Handler->>OSClient: Search(ctx, indices, query)
OSClient-->>Handler: SearchResponse
Handler->>Client: 200 QueryLogsResponse
Note over Handler,Observer: Webhook path (async)
Client->>Handler: HandleAlertWebhook(body)
Handler->>Observer: ForwardAlert (goroutine)
Handler-->>Client: 200 success
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (14)
observability-logs-opensearch/internal/observer/client.go (1)
27-29: Avoid two independent 10s timeouts for the same outbound call.Line 28 sets
http.Client.Timeoutto 10s, while the caller also wrapsForwardAlertwith a 10s context timeout (observability-logs-opensearch/internal/handlers.go, Lines 500-509). This creates timeout races and inconsistent failure surfaces. Prefer a single timeout owner (typically request context), or make client timeout larger than call-site deadlines.Proposed simplification
func NewClient(baseURL string) *Client { return &Client{ baseURL: strings.TrimRight(baseURL, "/"), - httpClient: &http.Client{ - Timeout: 10 * time.Second, - }, + httpClient: &http.Client{}, } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/internal/observer/client.go` around lines 27 - 29, The http.Client in client.go currently sets a 10s Timeout which duplicates the 10s context deadline applied by the caller of ForwardAlert (handlers.go), causing racey/ambiguous timeouts; remove the client-level Timeout (set it to 0) and rely on the request context passed into ForwardAlert (ensure requests are created with http.NewRequestWithContext) so the context is the single owner of the deadline, or alternatively increase the client Timeout to a value larger than the caller deadline if you must keep it. Ensure changes touch the httpClient instantiation in client.go and the request creation inside ForwardAlert so the ctx controls cancellation.observability-logs-opensearch/internal/config.go (1)
45-57: InvalidLOG_LEVELvalues are silently ignored.If an unrecognized log level is provided (e.g.,
"TRACE"or"verbose"), the code falls through and defaults toLevelInfowithout any warning. Consider logging a warning or returning an error for invalid values.♻️ Proposed fix to warn on invalid log level
logLevel := slog.LevelInfo if level := os.Getenv("LOG_LEVEL"); level != "" { switch strings.ToUpper(level) { case "DEBUG": logLevel = slog.LevelDebug case "INFO": logLevel = slog.LevelInfo case "WARN", "WARNING": logLevel = slog.LevelWarn case "ERROR": logLevel = slog.LevelError + default: + return nil, fmt.Errorf("invalid LOG_LEVEL %q: must be DEBUG, INFO, WARN, or ERROR", level) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/internal/config.go` around lines 45 - 57, The LOG_LEVEL parsing silently ignores unknown values — update the switch over strings.ToUpper(level) that sets logLevel to handle a default/else branch: if the value is unrecognized (e.g., "TRACE", "verbose") emit a warning via the package logger (slog.Warn or similar) mentioning the invalid LOG_LEVEL and the accepted values and keep the previous/default level, or alternatively return an error from the surrounding config loader; reference the existing logLevel variable, the LOG_LEVEL env lookup, and the switch block to add the default-case warning/validation behavior.observability-logs-opensearch/helm/templates/adapter/deployment.yaml (1)
28-60: Consider adding liveness and readiness probes.The container lacks health probes, which limits Kubernetes' ability to detect unhealthy pods and manage traffic routing. This is particularly important since the adapter makes external calls to OpenSearch.
♻️ Proposed fix to add health probes
If the OpenAPI spec includes a health endpoint, or if you add one:
ports: - containerPort: 9098 + livenessProbe: + httpGet: + path: /healthz + port: 9098 + initialDelaySeconds: 5 + periodSeconds: 10 + readinessProbe: + httpGet: + path: /readyz + port: 9098 + initialDelaySeconds: 5 + periodSeconds: 5 envFrom:Alternatively, use a TCP socket probe if no health endpoint exists:
livenessProbe: tcpSocket: port: 9098 initialDelaySeconds: 5 periodSeconds: 10🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/helm/templates/adapter/deployment.yaml` around lines 28 - 60, The container "logs-adapter-opensearch" has no Kubernetes health probes; add both livenessProbe and readinessProbe for port 9098 to allow K8s to detect and restart/unroute unhealthy pods: if the adapter exposes an HTTP health endpoint (e.g., /health or /ready) use httpGet probes pointing at port 9098 with sensible timings (initialDelaySeconds, periodSeconds, timeoutSeconds) for both probes; otherwise add tcpSocket probes against port 9098 for liveness and readiness with similar timing settings; update the container spec near the "ports" and "env" blocks to include these probe definitions.observability-logs-opensearch/Dockerfile (1)
12-12: Pin the Alpine version for reproducible builds.Using
alpine:latestcan lead to inconsistent builds over time as the tag gets updated. Pin to a specific version for reproducibility and to avoid unexpected breaking changes.♻️ Proposed fix
-FROM alpine:latest +FROM alpine:3.21🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/Dockerfile` at line 12, The Dockerfile uses an unpinned base image "FROM alpine:latest" which can cause non-reproducible builds; change that line to pin to a specific, supported Alpine tag (for example "FROM alpine:3.18" or your chosen stable patch version) so builds are deterministic and avoid unexpected breakages—update the Dockerfile's FROM alpine:latest entry accordingly and ensure the chosen version is documented or matches other images in the repo.observability-logs-opensearch/helm/values.yaml (1)
172-186: Add a comment clarifying thatopenSearchSecretNamemust be provided.The empty default for
openSearchSecretNamewill cause the Deployment to fail at runtime when Kubernetes cannot find the referenced Secret. While this follows the existing pattern (seeopenSearchSetup.openSearchSecretNameat line 191), adding documentation helps users understand this is a required override.📝 Suggested documentation
adapter: enabled: true observerUrl: "http://observer-internal.openchoreo-observability-plane:8081" + # Required: Name of the Kubernetes Secret containing OpenSearch credentials + # The secret must have 'username' and 'password' keys openSearchSecretName: ""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/helm/values.yaml` around lines 172 - 186, The Helm values file's adapter.openSearchSecretName is left empty which will cause the Deployment to fail if not overridden; add a comment next to adapter.openSearchSecretName explaining that this value is required and must be set to the name of an existing Kubernetes Secret (e.g., "name-of-opensearch-secret"), and mirror the guidance used for openSearchSetup.openSearchSecretName so users know to override it in their values file before deploying.observability-logs-opensearch/helm/templates/adapter/configmap.yaml (1)
13-15: Consider making OpenSearch connection parameters configurable.
OPENSEARCH_ADDRESSandOPENSEARCH_INDEX_PREFIXare hardcoded, whileOBSERVER_URLis already sourced from values. This limits deployment flexibility for clusters where OpenSearch runs on a different address or uses a different index prefix.♻️ Proposed refactor to make these values configurable
Add to
values.yamlunderadapter:adapter: enabled: true observerUrl: "http://observer-internal.openchoreo-observability-plane:8081" openSearchSecretName: "" + openSearchAddress: "https://opensearch:9200" + openSearchIndexPrefix: "container-logs-"Then update the ConfigMap:
data: SERVER_PORT: "9098" - OPENSEARCH_ADDRESS: "https://opensearch:9200" - OPENSEARCH_INDEX_PREFIX: "container-logs-" + OPENSEARCH_ADDRESS: {{ .Values.adapter.openSearchAddress | quote }} + OPENSEARCH_INDEX_PREFIX: {{ .Values.adapter.openSearchIndexPrefix | quote }} OBSERVER_URL: {{ .Values.adapter.observerUrl | quote }}Note: The same hardcoding exists in
helm/templates/opensearch-setup-logs/job.yaml(line 24). Consider addressing both for consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/helm/templates/adapter/configmap.yaml` around lines 13 - 15, Make OPENSEARCH_ADDRESS and OPENSEARCH_INDEX_PREFIX configurable via values by adding entries under adapter (e.g. adapter.opensearch.address and adapter.opensearch.indexPrefix) in values.yaml and update the ConfigMap template in helm/templates/adapter/configmap.yaml to use Helm value interpolation (referencing .Values.adapter.opensearch.address and .Values.adapter.opensearch.indexPrefix, with sensible defaults matching the current literals) instead of the hardcoded "https://opensearch:9200" and "container-logs-"; also apply the same change in helm/templates/opensearch-setup-logs/job.yaml (replace the hardcoded OPENSEARCH_ADDRESS/OPENSEARCH_INDEX_PREFIX there with the same .Values references) so deployments can override OpenSearch endpoint and index prefix consistently while keeping OBSERVER_URL usage unchanged.observability-logs-opensearch/Makefile (1)
3-3: Remote spec URL creates a network dependency for code generation.The
openapi-codegentarget fetches the spec from a remote URL, which means builds will fail without network access. This is acceptable for CI but may hinder local development in offline scenarios. Consider documenting this dependency or providing an option to use a local spec file.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/Makefile` at line 3, The Makefile currently hardcodes SPEC to a remote URL which creates a network dependency for the openapi-codegen target; update the Makefile to allow overriding SPEC from the environment or a make variable (e.g., check if SPEC is set before assigning) and add a short comment above the SPEC definition explaining the remote fetch and advising how to provide a local spec for offline development, and ensure the openapi-codegen target uses the SPEC variable so local files can be used by setting SPEC=/path/to/local.yaml.observability-logs-opensearch/internal/opensearch/queries.go (1)
267-282: Redundant check for end index inclusion.The loop condition
current.Before(end) || current.Equal(end)already ensures the end date is included when iterating. The subsequent check on lines 276-279 is defensive but unnecessary.This is a minor readability nit; the logic is correct.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/internal/opensearch/queries.go` around lines 267 - 282, The code builds daily index names into indices by iterating with current (start) until current.Before(end) || current.Equal(end), which already includes the end date, so remove the redundant endIndexName creation and the contains(...) conditional block; update the function that constructs indices (the loop using current, qb.indexPrefix and current.Format("2006-01-02")) to return indices directly after the loop and delete the extra end inclusion logic (and any unused variables like endIndexName) to improve readability.observability-logs-opensearch/internal/handlers.go (3)
174-175: Error fromFromLogsQueryResponseLogs0is ignored.The error return from
logs.FromLogsQueryResponseLogs0(entries)is discarded with_. If this conversion fails, the response will have nil/empty logs without indication of the failure.Consider logging or handling this error, even if unlikely to occur.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/internal/handlers.go` around lines 174 - 175, The call to logs.FromLogsQueryResponseLogs0(entries) currently ignores its error result which can silently produce nil/empty resp.Logs; change this to capture the return values (e.g., logsObj, err := logs.FromLogsQueryResponseLogs0(entries)), then handle err instead of discarding it—either log the error using the package logger or return/propagate an error from the handler—and only set resp.Logs = &logsObj when err == nil; reference the FromLogsQueryResponseLogs0 call, the entries variable, and resp.Logs when implementing this fix.
437-442: Consider returning 404 when alert rule not found during update.Currently returns
400 Bad Requestwhen the monitor doesn't exist. A404 Not Foundwould be more semantically correct and consistent withGetAlertRule.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/internal/handlers.go` around lines 437 - 442, The handler currently returns a gen.UpdateAlertRule400JSONResponse when the alert rule isn't found; change this to return a 404-style response instead: replace the gen.UpdateAlertRule400JSONResponse construction with the appropriate gen.UpdateAlertRule404JSONResponse (or the existing NotFound response type used by GetAlertRule) and set Title to ptr(gen.NotFound) and Message to ptr("alert rule not found") so the UpdateAlertRule handler returns HTTP 404 when the monitor/alert rule is missing.
500-509: Fire-and-forget goroutine may lose alerts on shutdown.The goroutine spawned for
ForwardAlertis detached from the request lifecycle. If the service shuts down while alerts are in-flight, they will be silently dropped. Consider using a worker pool with graceful shutdown, or at minimum track in-flight operations.Additionally, failed forwards are only logged without retry. For critical alerting workflows, consider implementing retry logic or a dead-letter queue.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/internal/handlers.go` around lines 500 - 509, The detached goroutine that calls h.observerClient.ForwardAlert (in the anonymous go func in handlers.go) can drop alerts on shutdown and lacks retries; replace the fire-and-forget pattern by enqueuing the forward request into a managed worker queue or at minimum incrementing a shared sync.WaitGroup (or tracking counter) before spawning the worker and decrementing it after completion so shutdown can wait for in-flight forwards, and add retry logic (e.g., exponential backoff with a few attempts) around h.observerClient.ForwardAlert and fallback to a dead-letter queue or persistent store on persistent failure; ensure the shutdown/path that cancels context waits for the WaitGroup or drains the worker queue so alerts are not lost and use the existing h.logger to record retry attempts and final failures.observability-logs-opensearch/internal/opensearch/types.go (2)
222-226: Silently ignoringjson.Marshalerror may hide issues.If marshaling fails (e.g., due to an unmarshalable type like a channel in the query map), the function returns an empty reader, causing confusing downstream errors in the search request.
♻️ Suggested fix: return error or log warning
-// buildSearchBody converts a query map to an io.Reader for the search request. -func buildSearchBody(query map[string]interface{}) io.Reader { - body, _ := json.Marshal(query) - return strings.NewReader(string(body)) +// buildSearchBody converts a query map to an io.Reader for the search request. +func buildSearchBody(query map[string]interface{}) (io.Reader, error) { + body, err := json.Marshal(query) + if err != nil { + return nil, fmt.Errorf("failed to marshal query: %w", err) + } + return strings.NewReader(string(body)), nil }This would require updating the caller in
client.goto handle the error.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/internal/opensearch/types.go` around lines 222 - 226, The buildSearchBody function currently swallows json.Marshal errors (buildSearchBody) which can mask invalid query payloads; change buildSearchBody to return (io.Reader, error) and propagate the json.Marshal error instead of returning an empty reader, update its callers in client.go to handle the error (either return it up the stack or log and fail the request), and ensure any tests or call sites that expect the old signature are updated accordingly; alternatively, if you prefer not to change signatures, at minimum log the marshal error with context and return nil plus the error so downstream code can fail fast.
293-308: Log level detection is simplistic and may produce false positives.The substring-based matching can misclassify logs. For example, a log containing
"INFORMATION"would match"INFO", and"UNDEFINED"is treated as a valid log level rather than a fallback. Consider using word boundaries or more precise patterns if accuracy is important.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/internal/opensearch/types.go` around lines 293 - 308, The extractLogLevel function currently does substring matching against logLevels which causes false positives (e.g., "INFORMATION" matching "INFO") and treats "UNDEFINED" as a real level; update extractLogLevel to perform whole-word matching using word-boundary regex (e.g., \bLEVEL\b) or tokenization and normalize the log string first, check levels in order of decreasing length to avoid substring collisions, map "WARNING" to "WARN" as before, and remove "UNDEFINED" from the logLevels list so that the function falls back to a final default (e.g., "INFO") only when no whole-word level is found.observability-logs-opensearch/internal/opensearch/client.go (1)
122-127: Potential nil pointer dereference when converting Score.If
h.Scoreis 0 or the field is missing/null in the response, converting it directly tofloat64and taking its address may not behave as expected. The current code always sets a non-nil Score pointer.Consider checking if the score is meaningful before assignment, or document that Score will always be set (possibly to 0).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/internal/opensearch/client.go` around lines 122 - 127, The code unconditionally converts h.Score to float64 and assigns its address to hit.Score, risking a misleading non-nil pointer when h.Score is missing or null; update the logic around the hit.Score assignment (the block using h.Score, hit.Score, and response.Hits.Hits) to check whether h.Score is actually present/valid before creating and assigning a *float64 (e.g., only allocate and set scorePtr when h.Score != nil or when the source indicates a valid score), otherwise leave hit.Score nil (or explicitly set to pointer to zero if you choose to document that behavior) so callers can distinguish absent vs present scores.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@observability-logs-opensearch/internal/config.go`:
- Around line 36-43: The TLS skip-verification flag currently defaults to true
(tlsSkipVerify := true) which is insecure; change the default to false
(tlsSkipVerify := false) and keep the existing parsing of
OPENSEARCH_TLS_SKIP_VERIFY so only an explicit true opt-out will disable
certificate verification; update any related variable/constant names or comments
in the same file (e.g., references to tlsSkipVerify and the environment variable
OPENSEARCH_TLS_SKIP_VERIFY) to reflect the new secure default.
In `@observability-logs-opensearch/internal/handlers.go`:
- Around line 337-342: The handler currently returns
gen.DeleteAlertRule500JSONResponse when the alert rule isn't found; change this
to return gen.DeleteAlertRule404JSONResponse instead so the response follows
REST semantics. Locate the branch checking the boolean found (the block that
returns gen.DeleteAlertRule500JSONResponse) and replace that response with
gen.DeleteAlertRule404JSONResponse, setting Title to ptr(gen.NotFound) and
Message to ptr("alert rule not found") so the returned type and status match a
404 Not Found.
- Around line 698-746: The monitor query doesn't include namespace so
extractQueryMetadata never receives it; update BuildLogAlertingRuleQuery to add
a term filter for params.Metadata.Namespace (same approach used in
BuildComponentLogsQueryV1) and then modify extractQueryMetadata to handle the
namespace term by adding a case for opensearch.OSNamespaceName that assigns
namespace = value; this ensures parseMonitorToAlertRuleResponse receives and
returns the namespace metadata.
In `@observability-logs-opensearch/internal/observer/client.go`:
- Around line 40-79: Add table-driven unit tests for the outbound HTTP path
exercised by Client.ForwardAlert: create subtests for (1) success (200)
returning no error, (2) non-2xx with a response body and assert the error
contains the status code and trimmed body, and (3) transport/context
cancellation or timeout causing an error. Use httptest.Server to capture and
assert the incoming request JSON payload and Content-Type header, and set the
Client.httpClient to the server.Client() (or a client with a short
Transport/Timeout) so ForwardAlert sends to the test server; for the
cancellation case, use a context with cancel/timeout to trigger the request
abort and assert the returned error reflects the cancellation/timeout.
In `@observability-logs-opensearch/internal/opensearch/client.go`:
- Around line 36-40: The TLS config in the http.Transport construction
(Transport: &http.Transport{ TLSClientConfig: &tls.Config{ InsecureSkipVerify:
insecureSkipVerify } }) does not set MinVersion, allowing older TLS versions;
update the tls.Config used by Transport to explicitly set MinVersion to
tls.VersionTLS12 (or tls.VersionTLS13 if supported) to prevent downgrade attacks
while keeping the existing InsecureSkipVerify handling.
- Around line 189-194: The CreateMonitor function is ignoring the passed ctx
because it uses http.NewRequest; change the request creation to use
http.NewRequestWithContext(ctx, "POST", path, bytes.NewReader(body)) so the
context (cancellation/timeout) is propagated, keep the existing error handling
around request creation and ensure req.Header.Add("Content-Type",
"application/json") remains after the new request is created.
- Around line 70-80: The health-check code in client.go consumes res.Body with
io.ReadAll before attempting to json.Decode, leaving an empty reader; change the
flow in the health check (the block that checks res.StatusCode and the
subsequent JSON decode of health) to read the response body once into a byte
slice and then use that byte slice for both the error message and JSON parsing
(e.g., json.Unmarshal into the local health struct) or else buffer and replace
res.Body (using io.NopCloser) so json.NewDecoder can read it; update the error
branch and the decode branch around the health struct to use the single buffered
body instead of reading res.Body twice.
In `@observability-logs-opensearch/internal/opensearch/queries.go`:
- Around line 445-458: GetOperatorSymbol currently returns an empty string for
unknown operators which produces malformed painless scripts; change
GetOperatorSymbol to validate input and return an explicit error (e.g., change
signature to GetOperatorSymbol(operator string) (string, error)) or a clear
default like an error indicator, and update its caller
BuildLogAlertingRuleMonitorBody to handle the error (fail validation and return
it rather than building an invalid script). Ensure the new behavior returns a
non-empty valid symbol for "gt","gte","lt","lte" and returns a descriptive error
for unknown operators so BuildLogAlertingRuleMonitorBody can abort or surface
the error to the caller.
In `@observability-logs-opensearch/main.go`:
- Around line 73-78: The goroutine that runs srv.Start currently calls os.Exit
on error which bypasses defers (cancel(), shutdownCancel()) and can race with
signal setup; replace that immediate exit by sending the error to a dedicated
channel (e.g., errCh) so the main goroutine can handle shutdown. Modify the
anonymous goroutine (the one invoking srv.Start) to send the returned error to
errCh instead of calling os.Exit, and update the main routine to select on errCh
and signal context cancellation to run graceful cleanup and logging via
logger.Error when an error arrives.
In `@observability-logs-opensearch/Makefile`:
- Around line 17-19: The unit-test Makefile target currently unconditionally
runs mv on coverage.out which will fail if no coverage.out was produced; update
the unit-test target (the commands under the "unit-test" target that reference
coverage.out and $(MODULE_NAME)-coverage.out) to only move or copy the file when
coverage.out exists (e.g., check for existence with a shell conditional or use a
guarded cp/mv command) so the make target succeeds when no tests/coverage are
generated.
---
Nitpick comments:
In `@observability-logs-opensearch/Dockerfile`:
- Line 12: The Dockerfile uses an unpinned base image "FROM alpine:latest" which
can cause non-reproducible builds; change that line to pin to a specific,
supported Alpine tag (for example "FROM alpine:3.18" or your chosen stable patch
version) so builds are deterministic and avoid unexpected breakages—update the
Dockerfile's FROM alpine:latest entry accordingly and ensure the chosen version
is documented or matches other images in the repo.
In `@observability-logs-opensearch/helm/templates/adapter/configmap.yaml`:
- Around line 13-15: Make OPENSEARCH_ADDRESS and OPENSEARCH_INDEX_PREFIX
configurable via values by adding entries under adapter (e.g.
adapter.opensearch.address and adapter.opensearch.indexPrefix) in values.yaml
and update the ConfigMap template in helm/templates/adapter/configmap.yaml to
use Helm value interpolation (referencing .Values.adapter.opensearch.address and
.Values.adapter.opensearch.indexPrefix, with sensible defaults matching the
current literals) instead of the hardcoded "https://opensearch:9200" and
"container-logs-"; also apply the same change in
helm/templates/opensearch-setup-logs/job.yaml (replace the hardcoded
OPENSEARCH_ADDRESS/OPENSEARCH_INDEX_PREFIX there with the same .Values
references) so deployments can override OpenSearch endpoint and index prefix
consistently while keeping OBSERVER_URL usage unchanged.
In `@observability-logs-opensearch/helm/templates/adapter/deployment.yaml`:
- Around line 28-60: The container "logs-adapter-opensearch" has no Kubernetes
health probes; add both livenessProbe and readinessProbe for port 9098 to allow
K8s to detect and restart/unroute unhealthy pods: if the adapter exposes an HTTP
health endpoint (e.g., /health or /ready) use httpGet probes pointing at port
9098 with sensible timings (initialDelaySeconds, periodSeconds, timeoutSeconds)
for both probes; otherwise add tcpSocket probes against port 9098 for liveness
and readiness with similar timing settings; update the container spec near the
"ports" and "env" blocks to include these probe definitions.
In `@observability-logs-opensearch/helm/values.yaml`:
- Around line 172-186: The Helm values file's adapter.openSearchSecretName is
left empty which will cause the Deployment to fail if not overridden; add a
comment next to adapter.openSearchSecretName explaining that this value is
required and must be set to the name of an existing Kubernetes Secret (e.g.,
"name-of-opensearch-secret"), and mirror the guidance used for
openSearchSetup.openSearchSecretName so users know to override it in their
values file before deploying.
In `@observability-logs-opensearch/internal/config.go`:
- Around line 45-57: The LOG_LEVEL parsing silently ignores unknown values —
update the switch over strings.ToUpper(level) that sets logLevel to handle a
default/else branch: if the value is unrecognized (e.g., "TRACE", "verbose")
emit a warning via the package logger (slog.Warn or similar) mentioning the
invalid LOG_LEVEL and the accepted values and keep the previous/default level,
or alternatively return an error from the surrounding config loader; reference
the existing logLevel variable, the LOG_LEVEL env lookup, and the switch block
to add the default-case warning/validation behavior.
In `@observability-logs-opensearch/internal/handlers.go`:
- Around line 174-175: The call to logs.FromLogsQueryResponseLogs0(entries)
currently ignores its error result which can silently produce nil/empty
resp.Logs; change this to capture the return values (e.g., logsObj, err :=
logs.FromLogsQueryResponseLogs0(entries)), then handle err instead of discarding
it—either log the error using the package logger or return/propagate an error
from the handler—and only set resp.Logs = &logsObj when err == nil; reference
the FromLogsQueryResponseLogs0 call, the entries variable, and resp.Logs when
implementing this fix.
- Around line 437-442: The handler currently returns a
gen.UpdateAlertRule400JSONResponse when the alert rule isn't found; change this
to return a 404-style response instead: replace the
gen.UpdateAlertRule400JSONResponse construction with the appropriate
gen.UpdateAlertRule404JSONResponse (or the existing NotFound response type used
by GetAlertRule) and set Title to ptr(gen.NotFound) and Message to ptr("alert
rule not found") so the UpdateAlertRule handler returns HTTP 404 when the
monitor/alert rule is missing.
- Around line 500-509: The detached goroutine that calls
h.observerClient.ForwardAlert (in the anonymous go func in handlers.go) can drop
alerts on shutdown and lacks retries; replace the fire-and-forget pattern by
enqueuing the forward request into a managed worker queue or at minimum
incrementing a shared sync.WaitGroup (or tracking counter) before spawning the
worker and decrementing it after completion so shutdown can wait for in-flight
forwards, and add retry logic (e.g., exponential backoff with a few attempts)
around h.observerClient.ForwardAlert and fallback to a dead-letter queue or
persistent store on persistent failure; ensure the shutdown/path that cancels
context waits for the WaitGroup or drains the worker queue so alerts are not
lost and use the existing h.logger to record retry attempts and final failures.
In `@observability-logs-opensearch/internal/observer/client.go`:
- Around line 27-29: The http.Client in client.go currently sets a 10s Timeout
which duplicates the 10s context deadline applied by the caller of ForwardAlert
(handlers.go), causing racey/ambiguous timeouts; remove the client-level Timeout
(set it to 0) and rely on the request context passed into ForwardAlert (ensure
requests are created with http.NewRequestWithContext) so the context is the
single owner of the deadline, or alternatively increase the client Timeout to a
value larger than the caller deadline if you must keep it. Ensure changes touch
the httpClient instantiation in client.go and the request creation inside
ForwardAlert so the ctx controls cancellation.
In `@observability-logs-opensearch/internal/opensearch/client.go`:
- Around line 122-127: The code unconditionally converts h.Score to float64 and
assigns its address to hit.Score, risking a misleading non-nil pointer when
h.Score is missing or null; update the logic around the hit.Score assignment
(the block using h.Score, hit.Score, and response.Hits.Hits) to check whether
h.Score is actually present/valid before creating and assigning a *float64
(e.g., only allocate and set scorePtr when h.Score != nil or when the source
indicates a valid score), otherwise leave hit.Score nil (or explicitly set to
pointer to zero if you choose to document that behavior) so callers can
distinguish absent vs present scores.
In `@observability-logs-opensearch/internal/opensearch/queries.go`:
- Around line 267-282: The code builds daily index names into indices by
iterating with current (start) until current.Before(end) || current.Equal(end),
which already includes the end date, so remove the redundant endIndexName
creation and the contains(...) conditional block; update the function that
constructs indices (the loop using current, qb.indexPrefix and
current.Format("2006-01-02")) to return indices directly after the loop and
delete the extra end inclusion logic (and any unused variables like
endIndexName) to improve readability.
In `@observability-logs-opensearch/internal/opensearch/types.go`:
- Around line 222-226: The buildSearchBody function currently swallows
json.Marshal errors (buildSearchBody) which can mask invalid query payloads;
change buildSearchBody to return (io.Reader, error) and propagate the
json.Marshal error instead of returning an empty reader, update its callers in
client.go to handle the error (either return it up the stack or log and fail the
request), and ensure any tests or call sites that expect the old signature are
updated accordingly; alternatively, if you prefer not to change signatures, at
minimum log the marshal error with context and return nil plus the error so
downstream code can fail fast.
- Around line 293-308: The extractLogLevel function currently does substring
matching against logLevels which causes false positives (e.g., "INFORMATION"
matching "INFO") and treats "UNDEFINED" as a real level; update extractLogLevel
to perform whole-word matching using word-boundary regex (e.g., \bLEVEL\b) or
tokenization and normalize the log string first, check levels in order of
decreasing length to avoid substring collisions, map "WARNING" to "WARN" as
before, and remove "UNDEFINED" from the logLevels list so that the function
falls back to a final default (e.g., "INFO") only when no whole-word level is
found.
In `@observability-logs-opensearch/Makefile`:
- Line 3: The Makefile currently hardcodes SPEC to a remote URL which creates a
network dependency for the openapi-codegen target; update the Makefile to allow
overriding SPEC from the environment or a make variable (e.g., check if SPEC is
set before assigning) and add a short comment above the SPEC definition
explaining the remote fetch and advising how to provide a local spec for offline
development, and ensure the openapi-codegen target uses the SPEC variable so
local files can be used by setting SPEC=/path/to/local.yaml.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3572b40a-5220-4f10-96c3-ed5cb25d521d
⛔ Files ignored due to path filters (4)
observability-logs-opensearch/go.sumis excluded by!**/*.sumobservability-logs-opensearch/internal/api/gen/client.gen.gois excluded by!**/gen/**observability-logs-opensearch/internal/api/gen/models.gen.gois excluded by!**/gen/**observability-logs-opensearch/internal/api/gen/server.gen.gois excluded by!**/gen/**
📒 Files selected for processing (21)
observability-logs-opensearch/Dockerfileobservability-logs-opensearch/Makefileobservability-logs-opensearch/go.modobservability-logs-opensearch/helm/Chart.yamlobservability-logs-opensearch/helm/templates/adapter/configmap.yamlobservability-logs-opensearch/helm/templates/adapter/deployment.yamlobservability-logs-opensearch/helm/templates/adapter/service.yamlobservability-logs-opensearch/helm/values.yamlobservability-logs-opensearch/internal/api/cfg-client.yamlobservability-logs-opensearch/internal/api/cfg-models.yamlobservability-logs-opensearch/internal/api/cfg-server.yamlobservability-logs-opensearch/internal/config.goobservability-logs-opensearch/internal/handlers.goobservability-logs-opensearch/internal/observer/client.goobservability-logs-opensearch/internal/opensearch/client.goobservability-logs-opensearch/internal/opensearch/labels.goobservability-logs-opensearch/internal/opensearch/queries.goobservability-logs-opensearch/internal/opensearch/types.goobservability-logs-opensearch/internal/server.goobservability-logs-opensearch/main.goobservability-logs-opensearch/module.yaml
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (4)
observability-logs-opensearch/Makefile (1)
3-13: Pin the OpenAPI spec to an immutable revision.
SPECcurrently tracks a branch head, somake openapi-codegencan start producing different generated code or fail without any change in this repo. Please vendor the spec or switch this to a commit SHA so regeneration stays reproducible across CI and developer machines.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/Makefile` around lines 3 - 13, The Makefile's SPEC variable currently points to a branch head URL which is mutable; update SPEC to reference an immutable source (either vendor the spec file into the repo and point SPEC to the local path, or replace the URL with a raw file URL that includes a specific commit SHA) so that the openapi-codegen target (targets: openapi-codegen and oapi-codegen-install) always regenerates from a reproducible spec; ensure the new SPEC still works with the existing oapi-codegen calls in the Makefile (and update any docs or CI that expect the old URL).observability-logs-opensearch/internal/handlers.go (3)
500-509: Consider tracking in-flight webhook forwards for graceful shutdown.The fire-and-forget goroutine with
context.Background()is acceptable for webhook acknowledgment, but in-flight forwards may be lost during server shutdown. For improved reliability, consider using async.WaitGroupor a worker pool that the server can drain during graceful termination.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/internal/handlers.go` around lines 500 - 509, The fire-and-forget goroutine that calls h.observerClient.ForwardAlert uses context.Background() and can be terminated during server shutdown; modify the code to track in-flight forwards using a sync.WaitGroup (or a dedicated worker pool) stored on the handler (e.g., add a field like wg sync.WaitGroup to the handler struct), increment the WG before spawning the goroutine, defer wg.Done() inside it, and use a shutdown signaling context (or the server’s shutdown sequence) to wait on wg.Wait() during graceful termination so all ForwardAlert calls complete; ensure ForwardAlert still receives a bounded context (e.g., context.WithTimeout) derived from the handler’s shutdown-aware context.
772-805: Consider extracting helper functions to reduce nesting depth.The function has 8 levels of nesting due to the OpenSearch query structure. While functional, breaking this into smaller helpers (e.g.,
getInputSearch,getFilterFromBool) would improve readability.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/internal/handlers.go` around lines 772 - 805, The extractWindowFromQuery function is deeply nested; refactor by extracting small helpers to flatten logic: create getInputSearch(monitor map[string]interface{}) (map[string]interface{}, bool) to return the first input's "search" map, getQueryMap(search map[string]interface{}) (map[string]interface{}, bool) to return the inner "query" map, getBoolFilter(queryMap map[string]interface{}) ([]interface{}, bool) to return the "bool" -> "filter" slice, and getTimestampFromFilter(filter map[string]interface{}) (string, bool) to extract the "@timestamp.from" string; then rewrite extractWindowFromQuery to call these helpers in sequence and return the parsed window substring, which will remove the deep nested type assertions and improve readability.
534-544: Handle integer type foralertValue.The switch handles
float64andstring, but JSON integers from some serializers may arrive asintorint64. Currently, these would silently result inalertValue = 0.♻️ Proposed enhancement
if countVal, ok := body["alertValue"]; ok { switch v := countVal.(type) { case float64: alertValue = v + case int: + alertValue = float64(v) + case int64: + alertValue = float64(v) case string: alertValue, err = strconv.ParseFloat(v, 64) if err != nil { return "", "", 0, time.Time{}, fmt.Errorf("failed to parse alertValue %q: %w", v, err) } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/internal/handlers.go` around lines 534 - 544, The switch on countVal in handlers.go only handles float64 and string, causing integer JSON values to be ignored; update the switch in the block that reads body["alertValue"] (the countVal handling that sets alertValue) to also handle integer types (e.g., int, int32, int64, uint, uint64) by converting them to float64 (e.g., via type assertion to the integer type then casting to float64), and consider handling json.Number if used by your decoder by parsing it to float64; keep the existing string parsing/error path intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@observability-logs-opensearch/internal/handlers.go`:
- Around line 437-442: The update handler currently returns a 400 response when
an alert rule isn't found; change the response to a 404 to match REST semantics
and the behavior of DeleteAlertRule. In the block that checks found (the branch
returning gen.UpdateAlertRule400JSONResponse), return the appropriate 404
response type (gen.UpdateAlertRule404JSONResponse) and set Title to
ptr(gen.NotFound) with Message ptr("alert rule not found") so the
UpdateAlertRule handler uses NotFound instead of BadRequest.
- Around line 173-175: The call to logs.FromLogsQueryResponseLogs0(entries) is
currently ignoring its returned error, which can hide serialization issues;
update the code in the handler where logs := gen.LogsQueryResponse_Logs{} is
created to capture the error (err := logs.FromLogsQueryResponseLogs0(entries)),
check it, and handle it appropriately (e.g., log the error via the existing
logger or return an error response) before assigning resp.Logs = &logs so
serialization failures are visible and propagated.
- Around line 270-272: The call to logs.FromLogsQueryResponseLogs1(entries)
ignores its returned error; update the block where gen.LogsQueryResponse_Logs is
constructed (variable logs, method FromLogsQueryResponseLogs1) to capture the
error, log or handle it (e.g., via processLogger/resp logger or return the
error) and only set resp.Logs = &logs on success; ensure the error path records
the failure with context so it isn't silently dropped.
In `@observability-logs-opensearch/internal/opensearch/client.go`:
- Around line 32-44: The OPENSEARCH TLS skip-verify default is currently true;
update the default to false in the configuration so certificate verification is
enabled by default. Specifically, change the default value for
OPENSEARCH_TLS_SKIP_VERIFY in observability-logs-opensearch/internal/config.go
from true to false and ensure any code using NewClient(insecureSkipVerify bool)
continues to pass the configured value (the TLS behavior in NewClient’s
TLSClientConfig.InsecureSkipVerify should reflect this flipped default). Also
update any related environment variable help text/comments to indicate
skip-verify is opt-in for dev/self-signed certs.
In `@observability-logs-opensearch/internal/opensearch/queries.go`:
- Around line 360-385: BuildLogAlertingRuleMonitorBody currently allows
sub-minute intervals because it uses intervalDuration.Minutes(); validate
params.Condition.Interval by checking the parsed duration is a positive whole
number of minutes (e.g., ensure intervalDuration > 0 and
intervalDuration%time.Minute == 0) before constructing
MonitorSchedulePeriod.Interval, and return a clear error if the provided
duration is fractional/minutes not whole (reject values like "30s" or "90s");
update the validation near the existing time.ParseDuration call and the
MonitorSchedule construction in BuildLogAlertingRuleMonitorBody to enforce
whole-minute intervals.
- Around line 186-248: BuildWorkflowRunLogsQuery currently lets an empty
WorkflowRunID and zero pagination/sort propagate to OpenSearch; update
BuildWorkflowRunLogsQuery (and use sanitizeWildcardValue,
WorkflowRunQueryParams, addTimeRangeFilter) to validate and guard those
zero-values: if params.WorkflowRunID is empty, produce a safe no-match query
(e.g., use a match_none query or a wildcard that cannot match) instead of using
"*" for pod name; if params.QueryParams.Limit is zero set a sane default limit
(e.g., 100) and if params.QueryParams.SortOrder is empty set a default sort
(e.g., "desc") so the built query mirrors the safe behavior in
BuildComponentLogsQueryV1. Ensure sanitizeWildcardValue is only applied when
WorkflowRunID is present and keep the rest of the filters (namespace, stepName,
time range, must_not) intact.
---
Nitpick comments:
In `@observability-logs-opensearch/internal/handlers.go`:
- Around line 500-509: The fire-and-forget goroutine that calls
h.observerClient.ForwardAlert uses context.Background() and can be terminated
during server shutdown; modify the code to track in-flight forwards using a
sync.WaitGroup (or a dedicated worker pool) stored on the handler (e.g., add a
field like wg sync.WaitGroup to the handler struct), increment the WG before
spawning the goroutine, defer wg.Done() inside it, and use a shutdown signaling
context (or the server’s shutdown sequence) to wait on wg.Wait() during graceful
termination so all ForwardAlert calls complete; ensure ForwardAlert still
receives a bounded context (e.g., context.WithTimeout) derived from the
handler’s shutdown-aware context.
- Around line 772-805: The extractWindowFromQuery function is deeply nested;
refactor by extracting small helpers to flatten logic: create
getInputSearch(monitor map[string]interface{}) (map[string]interface{}, bool) to
return the first input's "search" map, getQueryMap(search
map[string]interface{}) (map[string]interface{}, bool) to return the inner
"query" map, getBoolFilter(queryMap map[string]interface{}) ([]interface{},
bool) to return the "bool" -> "filter" slice, and getTimestampFromFilter(filter
map[string]interface{}) (string, bool) to extract the "@timestamp.from" string;
then rewrite extractWindowFromQuery to call these helpers in sequence and return
the parsed window substring, which will remove the deep nested type assertions
and improve readability.
- Around line 534-544: The switch on countVal in handlers.go only handles
float64 and string, causing integer JSON values to be ignored; update the switch
in the block that reads body["alertValue"] (the countVal handling that sets
alertValue) to also handle integer types (e.g., int, int32, int64, uint, uint64)
by converting them to float64 (e.g., via type assertion to the integer type then
casting to float64), and consider handling json.Number if used by your decoder
by parsing it to float64; keep the existing string parsing/error path intact.
In `@observability-logs-opensearch/Makefile`:
- Around line 3-13: The Makefile's SPEC variable currently points to a branch
head URL which is mutable; update SPEC to reference an immutable source (either
vendor the spec file into the repo and point SPEC to the local path, or replace
the URL with a raw file URL that includes a specific commit SHA) so that the
openapi-codegen target (targets: openapi-codegen and oapi-codegen-install)
always regenerates from a reproducible spec; ensure the new SPEC still works
with the existing oapi-codegen calls in the Makefile (and update any docs or CI
that expect the old URL).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e182dbab-b1d5-4c35-8ec3-e7e8481d3ba6
📒 Files selected for processing (10)
observability-logs-opensearch/Makefileobservability-logs-opensearch/internal/handlers.goobservability-logs-opensearch/internal/handlers_test.goobservability-logs-opensearch/internal/observer/client_test.goobservability-logs-opensearch/internal/opensearch/client.goobservability-logs-opensearch/internal/opensearch/client_test.goobservability-logs-opensearch/internal/opensearch/queries.goobservability-logs-opensearch/internal/opensearch/queries_test.goobservability-logs-opensearch/internal/opensearch/types_test.goobservability-logs-opensearch/main.go
🚧 Files skipped from review as they are similar to previous changes (1)
- observability-logs-opensearch/main.go
ce1c1c7 to
eb87a23
Compare
Signed-off-by: Nilushan Costa <nilushan@wso2.com>
eb87a23 to
b705362
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (2)
observability-logs-opensearch/internal/opensearch/queries.go (2)
299-344:⚠️ Potential issue | 🟠 MajorPersist namespace in the stored alert query filters.
params.Metadata.Namespaceis available here, but it never gets written intofilterConditions.parseMonitorToAlertRuleResponseinobservability-logs-opensearch/internal/handlers.goreconstructs rule metadata from these saved filters, soGetAlertRuleloses the namespace on round-trip unless it is stored here as well.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/internal/opensearch/queries.go` around lines 299 - 344, filterConditions is missing the Namespace term so the alert rule metadata round-trips loses Namespace; add a term entry similar to the Component/Environment/Project terms using the same pattern: append a map[string]interface{}{"term": map[string]interface{}{OSNamespaceID: map[string]interface{}{"value": params.Metadata.Namespace, "boost": 1}}} to filterConditions (update the slice near where filterConditions is constructed in queries.go) so parseMonitorToAlertRuleResponse can reconstruct Namespace correctly.
186-248:⚠️ Potential issue | 🟠 MajorStill reject blank workflow run IDs and apply the workflow-only filters here.
WorkflowRunID == ""still turns intokubernetes.pod_name: "*", andSearchPhrase/LogLevelsfromQueryParamsare never added tomustConditions. That means a malformed workflow request can fan out into a broad scan, and valid workflow queries silently ignore keyword/level filtering thatinternal/handlers.goalready accepts and forwards.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/internal/opensearch/queries.go` around lines 186 - 248, The BuildWorkflowRunLogsQuery currently builds a wildcard pod_name even when params.WorkflowRunID is blank and never applies QueryParams.SearchPhrase or QueryParams.LogLevels; fix by early-rejecting empty WorkflowRunID (e.g., return nil/empty query) at the top of BuildWorkflowRunLogsQuery if params.WorkflowRunID == "", then only build podNamePattern and add the pod_name wildcard when non-empty (using sanitizeWildcardValue as before), and append SearchPhrase and LogLevels filters from params.QueryParams (use a match_phrase or query_string for QueryParams.SearchPhrase and a terms/term filter for QueryParams.LogLevels) into mustConditions before calling addTimeRangeFilter so keyword/level filtering is applied for workflow-only queries.
🧹 Nitpick comments (1)
observability-logs-opensearch/Makefile (1)
3-13: Pin the OpenAPI spec instead of generating from a movingmainbranch.
SPECcurrently tracksrefs/heads/main, so everymake openapi-codegenrun depends on whatever is in the docs repo at that moment. That makes regeneration non-reproducible and can change this module’s generated API surface without any matching change in this repo. Please vendor the spec or pin the URL to an immutable tag/commit.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/Makefile` around lines 3 - 13, The SPEC variable currently points to a moving ref (refs/heads/main) causing non-reproducible codegen; update SPEC to reference an immutable location (a vendored local file or a commit/tag URL) and adjust the openapi-codegen target accordingly so ./$(CFG_DIR) uses the pinned spec; ensure oapi-codegen-install and targets (oapi-codegen-install, openapi-codegen) continue to work with the new SPEC value and commit the vendored spec file if you choose to vendor it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@observability-logs-opensearch/internal/handlers_test.go`:
- Around line 426-439: Update the assertion in TestDeleteAlertRule_NotFound to
expect the new 404 response type: after calling handler.DeleteAlertRule (the
DeleteAlertRule method on the LogsHandler instance created as handler), replace
the check that currently expects gen.DeleteAlertRule500JSONResponse with a check
for gen.DeleteAlertRule404JSONResponse so the test verifies the new not-found
behavior for missing monitors.
In `@observability-logs-opensearch/internal/handlers.go`:
- Around line 455-459: The handler returns a non-existent response type
gen.UpdateAlertRule404JSONResponse causing a build error; either add a 404
response for the UpdateAlertRule operation to your OpenAPI spec and re-run code
generation to populate internal/api/gen, or change the return to one of the
actual generated response types in the gen package (replace
gen.UpdateAlertRule404JSONResponse in internal/handlers.go with an existing
generated response type for UpdateAlertRule, e.g., the operation's default or
NotFound type) so the package compiles.
In `@observability-logs-opensearch/internal/opensearch/client.go`:
- Around line 172-178: The code currently returns parsed.Hits.Hits[0].ID when
one or more hits are found, which silently chooses the first match; modify the
function that returns (string, bool, error) to detect ambiguous matches by
checking if parsed.Hits.Total.Value > 1 or len(parsed.Hits.Hits) > 1 and, in
that case, return "", false, fmt.Errorf("ambiguous monitor name: %d matches
found", parsed.Hits.Total.Value) (or similar), leaving the existing zero-hit
check and the missing _id check intact so callers must handle ambiguous results
instead of acting on the first hit.
In `@observability-logs-opensearch/internal/opensearch/queries.go`:
- Around line 37-49: formatDurationForOpenSearch currently allows zero or
negative durations which leads to malformed ranges in BuildLogAlertingRuleQuery;
update formatDurationForOpenSearch to parse the duration (time.ParseDuration),
then return an error if parsed <= 0 (reject zero and negative values) with a
clear message (e.g., "duration must be a positive whole number of minutes or
hours"), keeping the existing checks for whole minutes/hours and returning
formatted strings when valid so BuildLogAlertingRuleQuery never receives 0m or
negative inputs.
In `@observability-logs-opensearch/main.go`:
- Around line 83-98: When the server start error arrives on errCh the select
branch only logs it and continues normal shutdown, resulting in a zero exit;
change the select handling for case err := <-errCh in the main goroutine to
ensure a non-zero exit path: after logging the error from errCh (use
logger.Error with slog.Any("error", err)), either call os.Exit(1) immediately or
set a flag/return value that causes the program to exit with a non-zero status
after running the graceful shutdown (ensure srv.Shutdown still runs if you want
cleanup but propagate the original start error by exiting with status 1). Update
the select branch that reads from errCh (and any code that follows) so startup
failures from srv.Start trigger os.Exit(1) rather than allowing a 0 exit.
---
Duplicate comments:
In `@observability-logs-opensearch/internal/opensearch/queries.go`:
- Around line 299-344: filterConditions is missing the Namespace term so the
alert rule metadata round-trips loses Namespace; add a term entry similar to the
Component/Environment/Project terms using the same pattern: append a
map[string]interface{}{"term": map[string]interface{}{OSNamespaceID:
map[string]interface{}{"value": params.Metadata.Namespace, "boost": 1}}} to
filterConditions (update the slice near where filterConditions is constructed in
queries.go) so parseMonitorToAlertRuleResponse can reconstruct Namespace
correctly.
- Around line 186-248: The BuildWorkflowRunLogsQuery currently builds a wildcard
pod_name even when params.WorkflowRunID is blank and never applies
QueryParams.SearchPhrase or QueryParams.LogLevels; fix by early-rejecting empty
WorkflowRunID (e.g., return nil/empty query) at the top of
BuildWorkflowRunLogsQuery if params.WorkflowRunID == "", then only build
podNamePattern and add the pod_name wildcard when non-empty (using
sanitizeWildcardValue as before), and append SearchPhrase and LogLevels filters
from params.QueryParams (use a match_phrase or query_string for
QueryParams.SearchPhrase and a terms/term filter for QueryParams.LogLevels) into
mustConditions before calling addTimeRangeFilter so keyword/level filtering is
applied for workflow-only queries.
---
Nitpick comments:
In `@observability-logs-opensearch/Makefile`:
- Around line 3-13: The SPEC variable currently points to a moving ref
(refs/heads/main) causing non-reproducible codegen; update SPEC to reference an
immutable location (a vendored local file or a commit/tag URL) and adjust the
openapi-codegen target accordingly so ./$(CFG_DIR) uses the pinned spec; ensure
oapi-codegen-install and targets (oapi-codegen-install, openapi-codegen)
continue to work with the new SPEC value and commit the vendored spec file if
you choose to vendor it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4de7645a-3955-4ad1-829e-ea7fcc5e8a4b
📒 Files selected for processing (11)
observability-logs-opensearch/Makefileobservability-logs-opensearch/internal/handlers.goobservability-logs-opensearch/internal/handlers_test.goobservability-logs-opensearch/internal/observer/client_test.goobservability-logs-opensearch/internal/opensearch/client.goobservability-logs-opensearch/internal/opensearch/client_test.goobservability-logs-opensearch/internal/opensearch/queries.goobservability-logs-opensearch/internal/opensearch/queries_test.goobservability-logs-opensearch/internal/opensearch/types.goobservability-logs-opensearch/internal/opensearch/types_test.goobservability-logs-opensearch/main.go
✅ Files skipped from review due to trivial changes (2)
- observability-logs-opensearch/internal/opensearch/types_test.go
- observability-logs-opensearch/internal/opensearch/client_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- observability-logs-opensearch/internal/observer/client_test.go
| if parsed.Hits.Total.Value == 0 || len(parsed.Hits.Hits) == 0 { | ||
| return "", false, nil | ||
| } | ||
| if parsed.Hits.Hits[0].ID == "" { | ||
| return "", false, fmt.Errorf("monitor search response missing _id field") | ||
| } | ||
| return parsed.Hits.Hits[0].ID, true, nil |
There was a problem hiding this comment.
Don't silently choose the first monitor hit.
When more than one monitor matches name, this method returns parsed.Hits.Hits[0].ID and the alert-rule handlers can operate on the wrong resource. Please fail on ambiguous matches here, or switch the lookup to a unique scoped identifier, instead of treating the first hit as authoritative.
Suggested guard
if parsed.Hits.Total.Value == 0 || len(parsed.Hits.Hits) == 0 {
return "", false, nil
}
+ if parsed.Hits.Total.Value > 1 {
+ return "", false, fmt.Errorf("multiple monitors found for name %q", name)
+ }
if parsed.Hits.Hits[0].ID == "" {
return "", false, fmt.Errorf("monitor search response missing _id field")
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if parsed.Hits.Total.Value == 0 || len(parsed.Hits.Hits) == 0 { | |
| return "", false, nil | |
| } | |
| if parsed.Hits.Hits[0].ID == "" { | |
| return "", false, fmt.Errorf("monitor search response missing _id field") | |
| } | |
| return parsed.Hits.Hits[0].ID, true, nil | |
| if parsed.Hits.Total.Value == 0 || len(parsed.Hits.Hits) == 0 { | |
| return "", false, nil | |
| } | |
| if parsed.Hits.Total.Value > 1 { | |
| return "", false, fmt.Errorf("multiple monitors found for name %q", name) | |
| } | |
| if parsed.Hits.Hits[0].ID == "" { | |
| return "", false, fmt.Errorf("monitor search response missing _id field") | |
| } | |
| return parsed.Hits.Hits[0].ID, true, nil |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@observability-logs-opensearch/internal/opensearch/client.go` around lines 172
- 178, The code currently returns parsed.Hits.Hits[0].ID when one or more hits
are found, which silently chooses the first match; modify the function that
returns (string, bool, error) to detect ambiguous matches by checking if
parsed.Hits.Total.Value > 1 or len(parsed.Hits.Hits) > 1 and, in that case,
return "", false, fmt.Errorf("ambiguous monitor name: %d matches found",
parsed.Hits.Total.Value) (or similar), leaving the existing zero-hit check and
the missing _id check intact so callers must handle ambiguous results instead of
acting on the first hit.
| func formatDurationForOpenSearch(d string) (string, error) { | ||
| parsed, err := time.ParseDuration(d) | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
|
|
||
| switch { | ||
| case parsed%time.Hour == 0: | ||
| return fmt.Sprintf("%dh", parsed/time.Hour), nil | ||
| case parsed%time.Minute == 0: | ||
| return fmt.Sprintf("%dm", parsed/time.Minute), nil | ||
| } | ||
| return "", fmt.Errorf("duration must be a whole number of minutes or hours; seconds are not supported: %s", d) |
There was a problem hiding this comment.
Reject zero and negative alert windows.
formatDurationForOpenSearch currently accepts values like 0m and -5m. BuildLogAlertingRuleQuery then emits ranges like {{period_end}}||-0h or {{period_end}}||--5m, which produces nonsensical monitor queries instead of a validation error.
Suggested guard
func formatDurationForOpenSearch(d string) (string, error) {
parsed, err := time.ParseDuration(d)
if err != nil {
return "", err
}
+ if parsed <= 0 {
+ return "", fmt.Errorf("duration must be greater than zero: %s", d)
+ }
switch {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@observability-logs-opensearch/internal/opensearch/queries.go` around lines 37
- 49, formatDurationForOpenSearch currently allows zero or negative durations
which leads to malformed ranges in BuildLogAlertingRuleQuery; update
formatDurationForOpenSearch to parse the duration (time.ParseDuration), then
return an error if parsed <= 0 (reject zero and negative values) with a clear
message (e.g., "duration must be a positive whole number of minutes or hours"),
keeping the existing checks for whole minutes/hours and returning formatted
strings when valid so BuildLogAlertingRuleQuery never receives 0m or negative
inputs.
| select { | ||
| case err := <-errCh: | ||
| logger.Error("Server error", slog.Any("error", err)) | ||
| case <-quit: | ||
| logger.Info("Shutting down gracefully") | ||
| } | ||
|
|
||
| shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 10*time.Second) | ||
| defer shutdownCancel() | ||
|
|
||
| if err := srv.Shutdown(shutdownCtx); err != nil { | ||
| logger.Error("Error during shutdown", slog.Any("error", err)) | ||
| os.Exit(1) | ||
| } | ||
|
|
||
| logger.Info("Server stopped") |
There was a problem hiding this comment.
Startup failures currently exit successfully.
When srv.Start() sends an error on errCh, the code logs it but then falls through the normal shutdown path and can still return exit code 0. A bind/listen failure should propagate a non-zero exit status so supervisors do not treat the process as a clean stop.
Suggested fix
+ startErr := false
select {
case err := <-errCh:
logger.Error("Server error", slog.Any("error", err))
+ startErr = true
case <-quit:
logger.Info("Shutting down gracefully")
}
@@
if err := srv.Shutdown(shutdownCtx); err != nil {
logger.Error("Error during shutdown", slog.Any("error", err))
os.Exit(1)
}
+ if startErr {
+ os.Exit(1)
+ }
logger.Info("Server stopped")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| select { | |
| case err := <-errCh: | |
| logger.Error("Server error", slog.Any("error", err)) | |
| case <-quit: | |
| logger.Info("Shutting down gracefully") | |
| } | |
| shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 10*time.Second) | |
| defer shutdownCancel() | |
| if err := srv.Shutdown(shutdownCtx); err != nil { | |
| logger.Error("Error during shutdown", slog.Any("error", err)) | |
| os.Exit(1) | |
| } | |
| logger.Info("Server stopped") | |
| startErr := false | |
| select { | |
| case err := <-errCh: | |
| logger.Error("Server error", slog.Any("error", err)) | |
| startErr = true | |
| case <-quit: | |
| logger.Info("Shutting down gracefully") | |
| } | |
| shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 10*time.Second) | |
| defer shutdownCancel() | |
| if err := srv.Shutdown(shutdownCtx); err != nil { | |
| logger.Error("Error during shutdown", slog.Any("error", err)) | |
| os.Exit(1) | |
| } | |
| if startErr { | |
| os.Exit(1) | |
| } | |
| logger.Info("Server stopped") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@observability-logs-opensearch/main.go` around lines 83 - 98, When the server
start error arrives on errCh the select branch only logs it and continues normal
shutdown, resulting in a zero exit; change the select handling for case err :=
<-errCh in the main goroutine to ensure a non-zero exit path: after logging the
error from errCh (use logger.Error with slog.Any("error", err)), either call
os.Exit(1) immediately or set a flag/return value that causes the program to
exit with a non-zero status after running the graceful shutdown (ensure
srv.Shutdown still runs if you want cleanup but propagate the original start
error by exiting with status 1). Update the select branch that reads from errCh
(and any code that follows) so startup failures from srv.Start trigger
os.Exit(1) rather than allowing a 0 exit.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (5)
observability-logs-opensearch/main.go (1)
83-98:⚠️ Potential issue | 🟠 MajorExit non-zero when the server fails to start.
When
srv.Start()reports an error onerrCh, main logs it but then follows the normal shutdown path and returns0. Supervisors will treat bind/listen failures as clean exits.🐛 Suggested fix
+ startErr := false select { case err := <-errCh: logger.Error("Server error", slog.Any("error", err)) + startErr = true case <-quit: logger.Info("Shutting down gracefully") } @@ if err := srv.Shutdown(shutdownCtx); err != nil { logger.Error("Error during shutdown", slog.Any("error", err)) os.Exit(1) } + if startErr { + os.Exit(1) + } logger.Info("Server stopped")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/main.go` around lines 83 - 98, The select handling startup errors should cause a non-zero exit when srv fails to start: in the select case that receives err from errCh (the channel used by srv.Start()), after logging via logger.Error("Server error", slog.Any("error", err")) return or call os.Exit(1) instead of proceeding to the normal shutdown path; ensure you still run any necessary cleanup (e.g., cancel contexts) before exiting and avoid falling through to the quit branch so supervisors see a non-zero exit on bind/listen failures. Use the existing errCh, quit, srv.Shutdown and logger symbols to locate and implement the change.observability-logs-opensearch/internal/opensearch/queries.go (2)
37-49:⚠️ Potential issue | 🟠 MajorReject zero and negative alert windows before formatting them.
0mand negative durations still satisfy the whole-minute/hour branches here, soBuildLogAlertingRuleQuerycan emit ranges like{{period_end}}||-0hor{{period_end}}||--5m.🐛 Suggested fix
func formatDurationForOpenSearch(d string) (string, error) { parsed, err := time.ParseDuration(d) if err != nil { return "", err } + if parsed <= 0 { + return "", fmt.Errorf("duration must be greater than zero: %s", d) + } switch {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/internal/opensearch/queries.go` around lines 37 - 49, The function formatDurationForOpenSearch currently allows zero and negative durations because it only checks divisibility; update formatDurationForOpenSearch to reject parsed durations <= 0 before the switch by returning an error (e.g., "duration must be a positive whole number of minutes or hours") so callers like BuildLogAlertingRuleQuery cannot generate "-0h" or "--5m"; keep the existing hour/minute branch logic and error behavior for non-whole-minute/hour values.
186-248:⚠️ Potential issue | 🟠 MajorWorkflow log queries can broaden unexpectedly and ignore requested filters.
When
WorkflowRunIDis empty, this buildskubernetes.pod_name: "*". It also passes through zeroLimit/emptySortOrder, and unlikeBuildComponentLogsQueryV1it never appliesQueryParams.SearchPhraseorQueryParams.LogLevelseven thoughobservability-logs-opensearch/internal/handlers.goLines 206-215 populate those fields before calling this builder.The call site at `observability-logs-opensearch/internal/handlers.go` Line 227 would then need to handle the returned error.🐛 Suggested direction
-func (qb *QueryBuilder) BuildWorkflowRunLogsQuery(params WorkflowRunQueryParams) map[string]interface{} { +func (qb *QueryBuilder) BuildWorkflowRunLogsQuery(params WorkflowRunQueryParams) (map[string]interface{}, error) { + if strings.TrimSpace(params.WorkflowRunID) == "" { + return nil, fmt.Errorf("workflow run id is required") + } + + limit := params.QueryParams.Limit + if limit <= 0 { + limit = 100 + } + + sortOrder := params.QueryParams.SortOrder + if sortOrder == "" { + sortOrder = "desc" + } + sanitizedWorkflowRunID := sanitizeWildcardValue(params.WorkflowRunID) podNamePattern := sanitizedWorkflowRunID + "*" @@ } + mustConditions = addSearchPhraseFilter(mustConditions, params.QueryParams.SearchPhrase) + mustConditions = addLogLevelFilter(mustConditions, params.QueryParams.LogLevels) @@ - "size": params.QueryParams.Limit, + "size": limit, @@ - "order": params.QueryParams.SortOrder, + "order": sortOrder, @@ - return query + return query, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/internal/opensearch/queries.go` around lines 186 - 248, BuildWorkflowRunLogsQuery currently constructs a broad wildcard when WorkflowRunID is empty and ignores QueryParams.SearchPhrase, QueryParams.LogLevels and zero/empty Limit/SortOrder; change BuildWorkflowRunLogsQuery (and its caller) to return (map[string]interface{}, error) and return an error when params.WorkflowRunID is empty (avoid building kubernetes.pod_name:"*"); enforce sane defaults for params.QueryParams.Limit and params.QueryParams.SortOrder when zero/empty (matching behavior in BuildComponentLogsQueryV1), and incorporate QueryParams.SearchPhrase and QueryParams.LogLevels into the bool.must (and/or must_not) clauses the same way BuildComponentLogsQueryV1 does; keep use of sanitizeWildcardValue, addTimeRangeFilter and the existing namespace/container filters, and update the call site to handle the returned error.observability-logs-opensearch/internal/opensearch/client.go (1)
172-178:⚠️ Potential issue | 🟠 MajorDon't silently pick the first matching monitor.
The alert-rule handlers use this ID for get/update/delete. If two monitors share a name,
parsed.Hits.Hits[0].IDmakes those operations target an arbitrary rule instead of failing on ambiguity.🐛 Suggested guard
if parsed.Hits.Total.Value == 0 || len(parsed.Hits.Hits) == 0 { return "", false, nil } + if parsed.Hits.Total.Value > 1 || len(parsed.Hits.Hits) > 1 { + return "", false, fmt.Errorf("ambiguous monitor name %q", name) + } if parsed.Hits.Hits[0].ID == "" { return "", false, fmt.Errorf("monitor search response missing _id field") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/internal/opensearch/client.go` around lines 172 - 178, The current code silently returns parsed.Hits.Hits[0].ID which can target an arbitrary monitor when multiple monitors match; update the logic in the search handling (the block that inspects parsed.Hits.Total.Value and parsed.Hits.Hits) to detect ambiguous results and return an explicit error instead of the first hit when more than one match is found (e.g., when parsed.Hits.Total.Value > 1 or len(parsed.Hits.Hits) > 1), including a clear message that the monitor name is ambiguous and the match count; keep the existing nil/empty handling for zero hits and preserve the existing check for a missing ID on the single-hit path.observability-logs-opensearch/Makefile (1)
3-13:⚠️ Potential issue | 🟠 MajorPin the OpenAPI spec revision used for codegen.
SPECtracksopenchoreo.github.io's movingmainbranch, so rerunningopenapi-codegenlater can regenerate different server/client code without any change in this module. That makes local builds and CI non-reproducible; pin a commit/tag or vendor the spec instead.♻️ Suggested direction
-SPEC := https://raw.githubusercontent.com/openchoreo/openchoreo.github.io/refs/heads/main/static/api-specs/observability-logs-adapter-api.yaml +SPEC_REF ?= <pinned-commit-or-tag> +SPEC := https://raw.githubusercontent.com/openchoreo/openchoreo.github.io/$(SPEC_REF)/static/api-specs/observability-logs-adapter-api.yaml🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-opensearch/Makefile` around lines 3 - 13, The SPEC variable currently points to the moving main branch URL which makes openapi-codegen non-reproducible; update the Makefile so SPEC references a fixed revision (commit SHA or tag) or replace SPEC with a local vendored file and update the openapi-codegen target accordingly (affecting SPEC and the openapi-codegen recipe that calls oapi-codegen) so repeated runs generate stable, reproducible client/server code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@observability-logs-opensearch/internal/handlers.go`:
- Around line 306-315: The CreateAlertRule and UpdateAlertRule handlers
currently treat all errors from queryBuilder.BuildLogAlertingRuleMonitorBody as
500s; change both handlers to detect validation errors returned by
BuildLogAlertingRuleMonitorBody (use errors.Is/errors.As against the
queryBuilder's validation error type/name, e.g., QueryBuilder.ValidationError or
queryBuilder.ErrValidation/ErrInvalidInput) and when a validation error is found
return gen.CreateAlertRule400JSONResponse / gen.UpdateAlertRule400JSONResponse
with Title ptr(gen.BadRequest) and Message set to the validation error message;
keep other errors mapping to the existing 500 response.
- Around line 462-495: The handler currently allows the incoming body to rename
the monitor by using request.Body.Metadata.Name while the response and
subsequent lookups use request.RuleName; fix this by enforcing consistency:
before calling h.queryBuilder.BuildLogAlertingRuleMonitorBody (or before passing
params), either validate that params.Metadata.Name equals request.RuleName and
return a 400 if they differ, or overwrite params.Metadata.Name =
request.RuleName so the built monitorBody cannot rename the resource; update the
logic around toAlertingRuleRequest / BuildLogAlertingRuleMonitorBody and the
monitor update path (UpdateMonitor) to use the canonical request.RuleName.
In `@observability-logs-opensearch/Makefile`:
- Around line 7-13: The Makefile hardcodes the oapi-codegen binary path to
$(shell go env GOPATH)/bin which breaks when GOBIN is set; update the
openapi-codegen target to compute a binary path that prefers GOBIN then falls
back to GOPATH/bin (e.g., define a local variable like OAPI_BIN that uses
$(shell go env GOBIN) if non-empty else $(shell go env GOPATH)/bin) and invoke
$(OAPI_BIN)/oapi-codegen for the three codegen calls; keep existing vars
OAPI_CODEGEN_VERSION, CFG_DIR and SPEC unchanged.
---
Duplicate comments:
In `@observability-logs-opensearch/internal/opensearch/client.go`:
- Around line 172-178: The current code silently returns parsed.Hits.Hits[0].ID
which can target an arbitrary monitor when multiple monitors match; update the
logic in the search handling (the block that inspects parsed.Hits.Total.Value
and parsed.Hits.Hits) to detect ambiguous results and return an explicit error
instead of the first hit when more than one match is found (e.g., when
parsed.Hits.Total.Value > 1 or len(parsed.Hits.Hits) > 1), including a clear
message that the monitor name is ambiguous and the match count; keep the
existing nil/empty handling for zero hits and preserve the existing check for a
missing ID on the single-hit path.
In `@observability-logs-opensearch/internal/opensearch/queries.go`:
- Around line 37-49: The function formatDurationForOpenSearch currently allows
zero and negative durations because it only checks divisibility; update
formatDurationForOpenSearch to reject parsed durations <= 0 before the switch by
returning an error (e.g., "duration must be a positive whole number of minutes
or hours") so callers like BuildLogAlertingRuleQuery cannot generate "-0h" or
"--5m"; keep the existing hour/minute branch logic and error behavior for
non-whole-minute/hour values.
- Around line 186-248: BuildWorkflowRunLogsQuery currently constructs a broad
wildcard when WorkflowRunID is empty and ignores QueryParams.SearchPhrase,
QueryParams.LogLevels and zero/empty Limit/SortOrder; change
BuildWorkflowRunLogsQuery (and its caller) to return (map[string]interface{},
error) and return an error when params.WorkflowRunID is empty (avoid building
kubernetes.pod_name:"*"); enforce sane defaults for params.QueryParams.Limit and
params.QueryParams.SortOrder when zero/empty (matching behavior in
BuildComponentLogsQueryV1), and incorporate QueryParams.SearchPhrase and
QueryParams.LogLevels into the bool.must (and/or must_not) clauses the same way
BuildComponentLogsQueryV1 does; keep use of sanitizeWildcardValue,
addTimeRangeFilter and the existing namespace/container filters, and update the
call site to handle the returned error.
In `@observability-logs-opensearch/main.go`:
- Around line 83-98: The select handling startup errors should cause a non-zero
exit when srv fails to start: in the select case that receives err from errCh
(the channel used by srv.Start()), after logging via logger.Error("Server
error", slog.Any("error", err")) return or call os.Exit(1) instead of proceeding
to the normal shutdown path; ensure you still run any necessary cleanup (e.g.,
cancel contexts) before exiting and avoid falling through to the quit branch so
supervisors see a non-zero exit on bind/listen failures. Use the existing errCh,
quit, srv.Shutdown and logger symbols to locate and implement the change.
In `@observability-logs-opensearch/Makefile`:
- Around line 3-13: The SPEC variable currently points to the moving main branch
URL which makes openapi-codegen non-reproducible; update the Makefile so SPEC
references a fixed revision (commit SHA or tag) or replace SPEC with a local
vendored file and update the openapi-codegen target accordingly (affecting SPEC
and the openapi-codegen recipe that calls oapi-codegen) so repeated runs
generate stable, reproducible client/server code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 73b91b4f-8ff6-4f5a-9a97-b6d2a7513e5e
⛔ Files ignored due to path filters (1)
observability-logs-opensearch/internal/api/gen/server.gen.gois excluded by!**/gen/**
📒 Files selected for processing (11)
observability-logs-opensearch/Makefileobservability-logs-opensearch/internal/handlers.goobservability-logs-opensearch/internal/handlers_test.goobservability-logs-opensearch/internal/observer/client_test.goobservability-logs-opensearch/internal/opensearch/client.goobservability-logs-opensearch/internal/opensearch/client_test.goobservability-logs-opensearch/internal/opensearch/queries.goobservability-logs-opensearch/internal/opensearch/queries_test.goobservability-logs-opensearch/internal/opensearch/types.goobservability-logs-opensearch/internal/opensearch/types_test.goobservability-logs-opensearch/main.go
✅ Files skipped from review due to trivial changes (5)
- observability-logs-opensearch/internal/observer/client_test.go
- observability-logs-opensearch/internal/opensearch/types_test.go
- observability-logs-opensearch/internal/opensearch/queries_test.go
- observability-logs-opensearch/internal/opensearch/client_test.go
- observability-logs-opensearch/internal/opensearch/types.go
| monitorBody, err := h.queryBuilder.BuildLogAlertingRuleMonitorBody(params) | ||
| if err != nil { | ||
| h.logger.Error("Failed to build monitor body", | ||
| slog.String("function", "CreateAlertRule"), | ||
| slog.Any("error", err), | ||
| ) | ||
| return gen.CreateAlertRule500JSONResponse{ | ||
| Title: ptr(gen.InternalServerError), | ||
| Message: ptr("internal server error"), | ||
| }, nil |
There was a problem hiding this comment.
Surface request-validation failures as 400s.
BuildLogAlertingRuleMonitorBody already rejects malformed client input such as invalid interval, window, or operator values. Returning 500 Internal Server Error from both create and update makes bad requests indistinguishable from backend failures and gives callers no actionable response.
Also applies to: 464-473
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@observability-logs-opensearch/internal/handlers.go` around lines 306 - 315,
The CreateAlertRule and UpdateAlertRule handlers currently treat all errors from
queryBuilder.BuildLogAlertingRuleMonitorBody as 500s; change both handlers to
detect validation errors returned by BuildLogAlertingRuleMonitorBody (use
errors.Is/errors.As against the queryBuilder's validation error type/name, e.g.,
QueryBuilder.ValidationError or queryBuilder.ErrValidation/ErrInvalidInput) and
when a validation error is found return gen.CreateAlertRule400JSONResponse /
gen.UpdateAlertRule400JSONResponse with Title ptr(gen.BadRequest) and Message
set to the validation error message; keep other errors mapping to the existing
500 response.
| params := toAlertingRuleRequest(request.Body) | ||
|
|
||
| monitorBody, err := h.queryBuilder.BuildLogAlertingRuleMonitorBody(params) | ||
| if err != nil { | ||
| h.logger.Error("Failed to build monitor body", | ||
| slog.String("function", "UpdateAlertRule"), | ||
| slog.Any("error", err), | ||
| ) | ||
| return gen.UpdateAlertRule500JSONResponse{ | ||
| Title: ptr(gen.InternalServerError), | ||
| Message: ptr("internal server error"), | ||
| }, nil | ||
| } | ||
|
|
||
| if _, err := h.osClient.UpdateMonitor(ctx, monitorID, monitorBody); err != nil { | ||
| h.logger.Error("Failed to update monitor", | ||
| slog.String("function", "UpdateAlertRule"), | ||
| slog.String("ruleName", request.RuleName), | ||
| slog.Any("error", err), | ||
| ) | ||
| return gen.UpdateAlertRule500JSONResponse{ | ||
| Title: ptr(gen.InternalServerError), | ||
| Message: ptr("internal server error"), | ||
| }, nil | ||
| } | ||
|
|
||
| now := time.Now().UTC().Format(time.RFC3339) | ||
| return gen.UpdateAlertRule200JSONResponse{ | ||
| Action: ptr(gen.Updated), | ||
| Status: ptr(gen.Synced), | ||
| RuleLogicalId: &request.RuleName, | ||
| RuleBackendId: &monitorID, | ||
| LastSyncedAt: &now, | ||
| }, nil |
There was a problem hiding this comment.
Don't let the request body rename the resource addressed by {ruleName}.
This handler resolves the monitor using request.RuleName, but the replacement body uses request.Body.Metadata.Name. If those differ, the call can rename the monitor while the response still reports the old logical ID, and subsequent get/delete requests by the old name stop working.
🐛 Suggested guard
params := toAlertingRuleRequest(request.Body)
+ if params.Metadata.Name != request.RuleName {
+ return gen.UpdateAlertRule400JSONResponse{
+ Title: ptr(gen.BadRequest),
+ Message: ptr("metadata.name must match ruleName"),
+ }, nil
+ }
monitorBody, err := h.queryBuilder.BuildLogAlertingRuleMonitorBody(params)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@observability-logs-opensearch/internal/handlers.go` around lines 462 - 495,
The handler currently allows the incoming body to rename the monitor by using
request.Body.Metadata.Name while the response and subsequent lookups use
request.RuleName; fix this by enforcing consistency: before calling
h.queryBuilder.BuildLogAlertingRuleMonitorBody (or before passing params),
either validate that params.Metadata.Name equals request.RuleName and return a
400 if they differ, or overwrite params.Metadata.Name = request.RuleName so the
built monitorBody cannot rename the resource; update the logic around
toAlertingRuleRequest / BuildLogAlertingRuleMonitorBody and the monitor update
path (UpdateMonitor) to use the canonical request.RuleName.
| oapi-codegen-install: | ||
| go install github.com/oapi-codegen/oapi-codegen/v2/cmd/oapi-codegen@$(OAPI_CODEGEN_VERSION) | ||
|
|
||
| openapi-codegen: oapi-codegen-install | ||
| cd $(CFG_DIR) && $(shell go env GOPATH)/bin/oapi-codegen --config cfg-models.yaml $(SPEC) | ||
| cd $(CFG_DIR) && $(shell go env GOPATH)/bin/oapi-codegen --config cfg-server.yaml $(SPEC) | ||
| cd $(CFG_DIR) && $(shell go env GOPATH)/bin/oapi-codegen --config cfg-client.yaml $(SPEC) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find and examine the Makefile
fd -t f "Makefile" observability-logs-opensearch/Repository: openchoreo/community-modules
Length of output: 110
🏁 Script executed:
cat observability-logs-opensearch/MakefileRepository: openchoreo/community-modules
Length of output: 924
🌐 Web query:
Go GOBIN environment variable go install behavior
💡 Result:
The GOBIN environment variable specifies the directory where the 'go install' command installs executable commands (main packages). If GOBIN is set to an absolute path, 'go install' installs executables there. If GOBIN is not set, it defaults to $GOPATH/bin or $HOME/go/bin if GOPATH is not set. Exceptions: Executables in $GOROOT (standard library tools) are installed to $GOROOT/bin or $GOTOOLDIR instead of GOBIN. Cross-compiled binaries go to $GOOS_$GOARCH subdirectories of the install dir. Special case: 'go install' on a single .go file requires GOBIN to be set; it does not fall back to the default. You can set GOBIN persistently with 'go env -w GOBIN=/path/to/bin'. The effective value (including default) can be queried indirectly via 'go list -f {{.Target}} ./...' or 'go env GOPATH' + '/bin', but 'go env GOBIN' shows only the explicit setting (empty if unset). Add the install directory to PATH to run installed binaries conveniently.
Citations:
- 1: https://pkg.go.dev/cmd/go@go1.24.2
- 2: https://go.dev/doc/tutorial/compile-install.html
- 3: https://go.dev/wiki/GOPATH
- 4: https://stackoverflow.com/questions/25216765/gobin-not-set-cannot-run-go-install
- 5: https://go.dev/issue/28122
- 6: https://go.dev/doc/godebug
🏁 Script executed:
# Verify Makefile $(or) syntax is valid by checking if this is standard Makefile syntax
cat > /tmp/test.mk << 'EOF'
TEST_VAR := $(or $(shell echo ""),fallback)
all:
`@echo` "$(TEST_VAR)"
EOF
make -f /tmp/test.mkRepository: openchoreo/community-modules
Length of output: 116
Don't hardcode oapi-codegen to GOPATH/bin.
go install ...@version installs to GOBIN when that is set, but this target always executes $(shell go env GOPATH)/bin/oapi-codegen. In environments with a custom GOBIN, the install step succeeds and codegen still fails with "file not found".
🛠️ Suggested fix
+OAPI_CODEGEN_BIN := $(or $(shell go env GOBIN),$(shell go env GOPATH)/bin)/oapi-codegen
+
oapi-codegen-install:
go install github.com/oapi-codegen/oapi-codegen/v2/cmd/oapi-codegen@$(OAPI_CODEGEN_VERSION)
openapi-codegen: oapi-codegen-install
- cd $(CFG_DIR) && $(shell go env GOPATH)/bin/oapi-codegen --config cfg-models.yaml $(SPEC)
- cd $(CFG_DIR) && $(shell go env GOPATH)/bin/oapi-codegen --config cfg-server.yaml $(SPEC)
- cd $(CFG_DIR) && $(shell go env GOPATH)/bin/oapi-codegen --config cfg-client.yaml $(SPEC)
+ cd $(CFG_DIR) && $(OAPI_CODEGEN_BIN) --config cfg-models.yaml $(SPEC)
+ cd $(CFG_DIR) && $(OAPI_CODEGEN_BIN) --config cfg-server.yaml $(SPEC)
+ cd $(CFG_DIR) && $(OAPI_CODEGEN_BIN) --config cfg-client.yaml $(SPEC)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@observability-logs-opensearch/Makefile` around lines 7 - 13, The Makefile
hardcodes the oapi-codegen binary path to $(shell go env GOPATH)/bin which
breaks when GOBIN is set; update the openapi-codegen target to compute a binary
path that prefers GOBIN then falls back to GOPATH/bin (e.g., define a local
variable like OAPI_BIN that uses $(shell go env GOBIN) if non-empty else $(shell
go env GOPATH)/bin) and invoke $(OAPI_BIN)/oapi-codegen for the three codegen
calls; keep existing vars OAPI_CODEGEN_VERSION, CFG_DIR and SPEC unchanged.
Purpose
Adds an OpenSearch-based logs adapter for the observability module, enabling log querying and retrieval from an OpenSearch backend.
This PR moves OpenSearch related log querying code out of https://github.com/openchoreo/openchoreo
Approach
Compared to the OpenSearch log querying code in https://github.com/openchoreo/openchoreo, this PR
Related Issues
openchoreo/openchoreo#2964
Checklist
Remarks
The version of the module has not been bumped in the README. This is intentional. It will be bumped when everything is ready
Summary by CodeRabbit
New Features
Tests