-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
Enable Kubernetes system traces for the kube-apiserver and kubelet to get control plane observability in SigNoz. This fills the one visibility gap in our current observability stack — we have application-level traces (OTEL SDKs) and service mesh traces (Linkerd), but no insight into Kubernetes internals.
What we'd see in SigNoz
- API server spans: request lifecycle (authn → authz → admission webhooks → etcd), webhook latency (Kyverno, Linkerd), etcd read/write performance
- Kubelet spans: CRI calls to containerd, pod sync routines, garbage collection, gRPC to container runtime
- Distributed context: API server propagates W3C trace context to webhooks, so Kyverno policy evaluation shows as child spans
Pre-requisites (already met)
| Requirement | Status |
|---|---|
| otelAgent DaemonSet on all nodes (incl. control plane) | 4/4 nodes |
hostPort: 4317 bound (OTLP gRPC on localhost) |
Confirmed |
| OTLP gRPC receiver configured in otelAgent | otlp.protocols.grpc |
| Traces pipeline wired to SigNoz | otlp → k8sattributes → batch → signoz-otel-collector |
| Tolerations for control plane | operator: Exists |
| K8s version supports stable kubelet tracing | v1.35.0 (stable since v1.34) |
No changes needed on the SigNoz/collector side.
Implementation
1. Create tracing config file on each server node
# /etc/rancher/k3s/tracing.yaml
apiVersion: apiserver.config.k8s.io/v1beta1
kind: TracingConfiguration
endpoint: localhost:4317
samplingRatePerMillion: 1000 # 0.1% — conservative starting point2. Update k3s server config on each server node
# /etc/rancher/k3s/config.yaml (append to existing config)
kube-apiserver-arg:
- "tracing-config-file=/etc/rancher/k3s/tracing.yaml"
kubelet-arg:
- "tracing-config-file=/etc/rancher/k3s/tracing.yaml"3. Rolling restart of k3s
Restart one server node at a time to avoid quorum loss:
# On each server node (node-1, node-2, node-3), one at a time:
sudo systemctl restart k3s
# Wait for node to be Ready before proceeding to nextnode-4 (worker) only needs the kubelet tracing config + restart if desired.
4. Verify
# Check traces are flowing
kubectl logs -n signoz -l app.kubernetes.io/component=otel-agent --tail=20 | grep -i trace
# Look for apiserver/kubelet spans in SigNoz UI
# Services should appear as "kube-apiserver" and "kubelet"Follow-up
- Monitor otelAgent resource usage after enabling — the 0.1% sampling rate should have negligible impact
- Consider bumping
samplingRatePerMillionif more coverage is needed for debugging - Optionally add SigNoz dashboard for control plane latency metrics derived from traces
References
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels