Skip to content

[BUG] Monitor-based SLOs require pre-existing Datadog monitor IDs #2702

@drewwells

Description

@drewwells

Pre-submission Checklist

  • I have searched existing issues and this is not a duplicate
  • This is a Datadog Operator issue (CRDs, reconciliation, etc.), not a Datadog Agent or Datadog service problem (dashboards, monitors, etc.)

Operator version

v1.23.1

Operator Helm chart version

n/a

Bug Report

What happened:

The DatadogSLO spec accepts monitorIDs as []int64 — raw Datadog API IDs. When both monitors and SLOs are managed declaratively through the operator, this creates a chicken-and-e
gg problem:

  1. Deploy a DatadogMonitor CR
  2. Wait for the operator to reconcile it and populate status.id
  3. Copy that numeric ID into the DatadogSLO CR's monitorIDs field
  4. Deploy the SLO

This cannot be done in a single Helm release or ArgoCD sync. There is no way to express "this SLO references the monitor defined by this other CR" declaratively.

What I expected:
A monitorRef field that references a DatadogMonitor CR by name/namespace, with the SLO controller resolving the ID from the referenced monitor's status.id
automatically:

apiVersion: datadoghq.com/v1alpha1
kind: DatadogSLO
metadata:
  name: my-service-availability
spec:
  name: my-service-availability
  type: monitor
  monitorRefs:
    - name: my-service-error-rate      # DatadogMonitor CR name
      namespace: my-service            # optional, defaults to same namespace
  targetThreshold: "99.9"
  timeframe: 30d

The SLO controller would watch the referenced DatadogMonitor CR, wait for status.id to be populated, then create/update the SLO with the resolved monitor ID. If the monitor hasn't
been reconciled yet, the SLO would requeue.

This is the same pattern that Kubernetes uses for secretRef, configMapRef, serviceAccountName, etc. — name-based references with the controller resolving the binding.

Steps to Reproduce

Create a monitor-based SLO without a monitorID
Try to link the monitorID by CR reference

Environment

Kubernetes version: n/a
Helm version: n/a
Cloud provider: n/a

Additional Context

There is a common standard that k8s CRs are declarative and do not specify status fields in an external API. As a result, the SLO operator is extremely complicated to use. We will likely need to write our own operator to shim this gap.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions