Skip to content

Slumlord is a kube operator tuning your costly resources

License

Notifications You must be signed in to change notification settings

cschockaert/slumlord

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Slumlord

Kubernetes operator for cost optimization -- automatically scales down workloads during off-hours and detects idle resources.

Features

Sleep Schedules -- scale down workloads on a time-based schedule:

  • Scales Deployments and StatefulSets to zero replicas
  • Scales Prometheus Operator CRDs (ThanosRuler, Alertmanager, Prometheus) to zero replicas
  • Suspends MariaDB Operator CRDs (MariaDB, MaxScale) via spec.suspend
  • Suspends CronJobs, FluxCD HelmReleases and Kustomizations
  • Hibernates CNPG PostgreSQL clusters
  • Timezone-aware scheduling with day-of-week filters
  • Overnight schedule support (e.g., 22:00-06:00)
  • Label selectors and name-based matching (with wildcards)
  • State preservation -- original replica counts, suspend states, and hibernation annotations are stored for restoration

Idle Detection -- detect and optionally scale down underutilized workloads:

  • Monitors CPU and memory usage against configurable thresholds
  • Configurable idle duration before action
  • Three modes: alert (report only), scale (auto-scale to zero), or resize (in-place pod right-sizing via K8s 1.33)
  • Supports Deployments, StatefulSets, and CronJobs
  • MatchNames wildcard selector (e.g., prod-*)
  • Finalizer ensures workloads are restored on detector deletion
  • Tracks original state for safe restoration

Note: The idle detector requires metrics-server in the cluster. Without it, the operator runs in degraded mode (always returns not-idle).

Installation

Helm (recommended)

helm install slumlord oci://ghcr.io/cschockaert/charts/slumlord --version 2.13.0

From source

make install  # Install CRDs
make run      # Run locally against current kubeconfig

Usage

Basic example -- nightly sleep on weekdays

apiVersion: slumlord.io/v1alpha1
kind: SlumlordSleepSchedule
metadata:
  name: nightly-sleep
spec:
  selector:
    matchLabels:
      slumlord.io/managed: "true"
    types:
      - Deployment
      - StatefulSet
  schedule:
    start: "22:00"
    end: "06:00"
    timezone: Europe/Paris
    days: [1, 2, 3, 4, 5]

Weekend shutdown

apiVersion: slumlord.io/v1alpha1
kind: SlumlordSleepSchedule
metadata:
  name: weekend-stop
spec:
  selector:
    matchLabels:
      slumlord.io/managed: "true"
    types:
      - Deployment
      - StatefulSet
      - CronJob
  schedule:
    start: "00:00"
    end: "23:59"
    timezone: Europe/Paris
    days: [0, 6]

CNPG PostgreSQL hibernation

apiVersion: slumlord.io/v1alpha1
kind: SlumlordSleepSchedule
metadata:
  name: pg-hibernate
spec:
  selector:
    matchLabels:
      slumlord.io/managed: "true"
    types:
      - Cluster
  schedule:
    start: "20:00"
    end: "07:00"
    timezone: Europe/Paris
    days: [1, 2, 3, 4, 5]

FluxCD reconciliation suspend

apiVersion: slumlord.io/v1alpha1
kind: SlumlordSleepSchedule
metadata:
  name: flux-suspend
spec:
  selector:
    matchLabels:
      slumlord.io/managed: "true"
    types:
      - HelmRelease
      - Kustomization
  schedule:
    start: "21:55"
    end: "06:05"
    timezone: Europe/Paris
    days: [1, 2, 3, 4, 5]

Important: When managing FluxCD resources alongside Deployments/StatefulSets, use a wider sleep window for the FluxCD schedule. Suspend Flux reconciliation before scaling workloads, and resume it after restoring them. This prevents Flux from restoring scaled-down workloads during the sleep window.

Prometheus Operator CRDs

apiVersion: slumlord.io/v1alpha1
kind: SlumlordSleepSchedule
metadata:
  name: monitoring-sleep
spec:
  selector:
    matchLabels:
      slumlord.io/managed: "true"
    types:
      - Prometheus
      - Alertmanager
      - ThanosRuler
  schedule:
    start: "20:00"
    end: "07:00"
    timezone: Europe/Paris
    days: [1, 2, 3, 4, 5]

Note: The Prometheus Operator itself should NOT be scaled down -- only its managed CRs. The operator must be running to reconcile the CRs back up on wake.

MariaDB Operator suspend

apiVersion: slumlord.io/v1alpha1
kind: SlumlordSleepSchedule
metadata:
  name: mariadb-suspend
spec:
  selector:
    matchLabels:
      slumlord.io/managed: "true"
    types:
      - MariaDB
      - MaxScale
  schedule:
    start: "21:55"
    end: "06:05"
    timezone: Europe/Paris
    days: [1, 2, 3, 4, 5]

Important: Like FluxCD, the MariaDB Operator's spec.suspend pauses its reconciliation loop. Suspend the operator before scaling down underlying workloads, and resume it after restoring them. This prevents the operator from recreating resources during the sleep window.

Idle detection -- alert mode

apiVersion: slumlord.io/v1alpha1
kind: SlumlordIdleDetector
metadata:
  name: idle-alert
spec:
  selector:
    matchLabels:
      slumlord.io/managed: "true"
    types:
      - Deployment
      - StatefulSet
  thresholds:
    cpuPercent: 5
    memoryPercent: 10
  idleDuration: "1h"
  action: alert

Idle detection -- auto-scale mode

apiVersion: slumlord.io/v1alpha1
kind: SlumlordIdleDetector
metadata:
  name: idle-scaler
spec:
  selector:
    matchNames:
      - "dev-*"
      - "staging-*"
    types:
      - Deployment
  thresholds:
    cpuPercent: 3
    memoryPercent: 5
  idleDuration: "2h"
  action: scale

Idle detection -- in-place resize mode

apiVersion: slumlord.io/v1alpha1
kind: SlumlordIdleDetector
metadata:
  name: idle-resizer
spec:
  selector:
    matchLabels:
      slumlord.io/managed: "true"
    types:
      - Deployment
      - StatefulSet
  thresholds:
    cpuPercent: 10
    memoryPercent: 15
  idleDuration: "1h"
  action: resize
  reconcileInterval: 10m
  resize:
    bufferPercent: 30
    minRequests:
      cpu: "50m"
      memory: "64Mi"

Note: The resize action requires Kubernetes 1.33+ with InPlacePodVerticalScaling feature gate enabled. It patches pod resource requests in-place without restarting pods.

CRD Reference

SlumlordSleepSchedule

Field Type Required Description
spec.selector.matchLabels map[string]string No Label selector for target workloads
spec.selector.matchNames []string No Name patterns (supports wildcards)
spec.selector.types []string No Workload types to manage. Valid: Deployment, StatefulSet, CronJob, Cluster, HelmRelease, Kustomization, ThanosRuler, Alertmanager, Prometheus, MariaDB, MaxScale. Default: all types
spec.schedule.start string Yes Sleep start time in HH:MM format
spec.schedule.end string Yes Wake time in HH:MM format
spec.schedule.timezone string No IANA timezone (e.g., Europe/Paris). Default: UTC
spec.schedule.days []int No Days of week (0=Sunday, 6=Saturday). Default: every day
spec.suspend bool No Pause the schedule. Sleeping workloads are woken up. Default: false
spec.reconcileInterval duration No Override reconcile interval (e.g., 2m, 10m). Default: 5m

SlumlordIdleDetector

Field Type Required Description
spec.selector.matchLabels map[string]string No Label selector for target workloads
spec.selector.matchNames []string No Name patterns with wildcard support (e.g., prod-*)
spec.selector.types []string No Workload types: Deployment, StatefulSet, CronJob. Default: all
spec.thresholds.cpuPercent int32 No CPU usage % threshold (0-100). Below = idle
spec.thresholds.memoryPercent int32 No Memory usage % threshold (0-100). Below = idle
spec.idleDuration string Yes How long a workload must be idle before action (e.g., 30m, 1h)
spec.action string Yes alert (report only), scale (auto-scale to zero), or resize (in-place right-sizing)
spec.reconcileInterval duration No Override reconcile interval (e.g., 5m, 10m). Default: 5m
spec.resize.bufferPercent int32 No Headroom % above actual usage for resize. Default: 25
spec.resize.minRequests.cpu quantity No Minimum CPU request floor. Default: 50m
spec.resize.minRequests.memory quantity No Minimum memory request floor. Default: 64Mi

Status

kubectl get slumlordsleepschedules -A
NAMESPACE   NAME            SLEEPING   START   END     DAYS      AGE
default     nightly-sleep   false      22:00   06:00   Mon-Fri   19h
kubectl get slumlordidledetectors -A
NAMESPACE   NAME          ACTION   IDLE DURATION   LAST CHECK            AGE
default     idle-alert    alert    1h              2026-02-08T12:00:00Z  1d

How it works

Sleep Schedules

The operator runs a reconciliation loop (default: every 5 minutes, configurable) for each SlumlordSleepSchedule resource:

  1. Checks if the current time (in the configured timezone) falls within the sleep window
  2. On sleep: scales Deployments/StatefulSets to 0, scales Prometheus Operator CRDs (ThanosRuler, Alertmanager, Prometheus) to 0, suspends CronJobs, hibernates CNPG clusters, suspends FluxCD HelmReleases/Kustomizations, and suspends MariaDB Operator CRDs (MariaDB, MaxScale)
  3. On wake: restores all workloads to their original state
  4. Original state (replica counts, suspend flags, hibernation annotations) is stored in status.managedWorkloads to survive operator restarts

Idle Detection

The idle detector reconciles periodically (default: every 5 minutes, configurable) for each SlumlordIdleDetector resource:

  1. Lists workloads matching the selector (labels and/or name patterns)
  2. Checks resource usage against configured thresholds
  3. Tracks how long each workload has been continuously idle
  4. In alert mode: reports idle workloads in status.idleWorkloads
  5. In scale mode: scales down workloads idle longer than idleDuration, stores original state in status.scaledWorkloads
  6. On detector deletion: restores all scaled workloads via finalizer

Performance considerations

The BinPacker reconciler performs cluster-wide list operations each cycle (Nodes, Pods, ReplicaSets, PDBs, Deployments, StatefulSets). These are served from the controller-runtime informer cache, not direct API server calls. On very large clusters (thousands of Deployments/Pods), consider scoping with nodeSelector and namespaces to reduce the working set.

graph LR
    A[SlumlordSleepSchedule] --> B[Sleep Controller]
    B --> C[Deployments]
    B --> D[StatefulSets]
    B --> E[CronJobs]
    B --> F[CNPG Clusters]
    B --> G[HelmReleases]
    B --> H[Kustomizations]
    B --> K[ThanosRulers]
    B --> L[Alertmanagers]
    B --> M[Prometheuses]
    B --> N[MariaDBs]
    B --> O[MaxScales]

    I[SlumlordIdleDetector] --> J[Idle Controller]
    J --> C
    J --> D
    J --> E
Loading

Performance Tuning

Reconciliation intervals

Each controller has a default reconcile interval that can be overridden globally via CLI flags or per-resource via spec.reconcileInterval:

Controller Default CLI Flag Per-resource field
SleepSchedule 5m --sleep-reconcile-interval spec.reconcileInterval
IdleDetector 5m30s --idle-reconcile-interval spec.reconcileInterval
BinPacker 6m --binpacker-reconcile-interval spec.reconcileInterval
NodeDrainPolicy 6m30s --nodedrain-reconcile-interval spec.reconcileInterval

Per-resource overrides take priority over global CLI flags, which take priority over built-in defaults.

Helm values:

reconcileIntervals:
  sleepSchedule: "10m"   # Reduce API server load in large clusters
  idleDetector: "10m"
  binPacker: "10m"
  nodeDrain: "10m"

Recommendations:

  • Dev/staging: use longer intervals (10m+) to reduce load
  • Production: use shorter intervals (2-5m) for faster response
  • Large clusters: combine longer intervals with scoped selectors (nodeSelector, namespaces)
  • Short intervals (< 1m) increase API server pressure with minimal benefit

Uninstallation

Important: Remove Slumlord resources before uninstalling to restore workloads.

# Restore all workloads by deleting resources first
kubectl delete slumlordidledetectors --all -A
kubectl delete slumlordsleepschedules --all -A

# Then uninstall
helm uninstall slumlord

Development

make generate    # Regenerate DeepCopy methods
make manifests   # Regenerate CRD manifests
make test        # Run tests
make lint        # Lint code
make build       # Build binary

License

Apache 2.0

About

Slumlord is a kube operator tuning your costly resources

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 3

  •  
  •  
  •  

Languages