Skip to content

Add language detection to auto-monitor to inject only the relevant SDK#380

Open
Miqueasher wants to merge 1 commit intoaws:mainfrom
Miqueasher:feature/auto-monitor-language-detection
Open

Add language detection to auto-monitor to inject only the relevant SDK#380
Miqueasher wants to merge 1 commit intoaws:mainfrom
Miqueasher:feature/auto-monitor-language-detection

Conversation

@Miqueasher
Copy link
Copy Markdown

Summary

When monitorAllServices: true, auto-monitor currently injects all 4 language SDK init containers (Java, Python, Node.js, .NET) into every pod regardless of runtime, causing liveness/readiness probe failures, restart loops, and deployment instability.

This PR adds a registry-based language detector that inspects container image config (ENV, CMD, ENTRYPOINT) via google/go-containerregistry without pulling layers (~100-500ms, 5s timeout)
Falls back gracefully through image name patterns → pod-spec env vars → pod-spec commands → all languages (current behavior), ensuring zero regression

Detection Layers

  1. Registry image config — fetch ENV/CMD/ENTRYPOINT from manifest (public registries + ECR with IRSA)
  2. Image name patterns — match language keywords in image reference
  3. Pod-spec env vars — check for JAVA_HOME, PYTHONPATH, NODE_ENV, DOTNET_ROOT, etc.
  4. Pod-spec commands — match runtime binaries (java, python, node, dotnet)
  5. Fallback — all configured languages (no regression)

Dependencies added

Package | Version | Purpose
github.com/google/go-containerregistry | v0.20.0 | Fetch image config from registry without pulling layers
github.com/aws/aws-sdk-go (existing) | v1.45.25 | ECR auth via custom keychain (no new AWS dependency)

Test Plan

  • Unit tests: 92/92 passing (65 new + 27 existing, zero regressions)
  • Live EKS validation (us-east-1): Custom operator build deployed with monitorAllServices=true, no manual annotations
  • Java, Python, Node.js, .NET public images → 1 init container each (detected via registry config)
  • Alpine, Nginx, Busybox → 4 init containers (correct fallback, no false positives)
  • Private ECR image with language keyword → detected via image name pattern
  • Pod-spec JAVA_HOME on Alpine → detected via env var fallback
  • Multi-container (Python + Nginx sidecar) → correctly detected Python only
  • Failure mode: When all detection fails, behavior is identical to current production
  • Operator image digest confirmed matching between running pod and ECR (sha256:5f74a89f76b3...), zero pod restarts observed

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant