012 - Hot reload feature proposal #83

Uzziee · 2025-11-18T06:49:59Z

This proposal is to add hot reload functionality, which will enable the app to reload any changes to virtual cluster config without the need to restart the app

Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com>

SamBarker

Design PR#83 Feedback - Configuration Reload Design

Date: 2026-01-28
Reviewer: Sam Barker
Design PR: #83

Executive Summary

Thank you for putting together this design proposal! Configuration reload is a critical operational feature that many users have been asking for, and your design work provides a solid foundation for moving this forward.

The current proposal focuses on file watch as the primary mechanism. This feedback suggests an alternative HTTP-first approach with 2-phase validation and discusses enhancements that will make either approach production-ready. The feedback builds on analysis of the POC implementation (PR#3176).

Your POC demonstrates the core reload mechanism works well - the questions here are primarily about the trigger mechanism and operator integration patterns. The groundwork you've laid out makes these decisions much clearer.

Proposed Change to Design: HTTP Endpoints as Primary Interface

Current Design Proposal

The design PR currently proposes file watch as the primary mechanism for configuration reload (Part 1), with potential HTTP endpoints as future work.

Recommended Alternative: HTTP-First Approach

I recommend inverting this: make HTTP endpoints the primary interface, with file watching as an optional convenience layer.

Rationale for HTTP-first:

✅ Universal: Works on bare metal, Kubernetes, and any deployment model
✅ Operator-friendly: Natural integration point for Kubernetes operator (operator detects ConfigMap changes → POST /admin/config/reload)
✅ Testable: Easy to test programmatically (integration tests can POST directly)
✅ Observable: Clear success/failure responses (200 OK vs 400 Bad Request with error details)
✅ Composable: File watching can be implemented as a layer that calls the HTTP endpoint internally
✅ Kubernetes-native: Aligns with how operators interact with workloads (API calls, not filesystem)

File watching challenges:

❌ Read-only filesystem (Kubernetes security best practice blocks file writes)
❌ ConfigMap mounting complexity (..data symlinks, atomic updates)
❌ No feedback mechanism (how does operator know reload succeeded/failed?)
❌ Race conditions (file watch triggers before ConfigMap fully mounted)

Proposed architecture:

Core: HTTP Management Endpoints

Proxy exposes on localhost:9190 (management port):
    ↓
POST /admin/config/validate (validate without applying)
POST /admin/config/reload (apply changes)
GET /admin/config/status (current config version, last operation status)
GET /admin/health (proxy health for liveness/readiness, already exists)
    ↓
Core reload mechanism (shared by all trigger mechanisms)

2-Phase Workflow:

Validate: Build models, initialize filters, check internal consistency (no port binding)
Reload: If validation passes, apply changes (bind ports, register gateways)

Security:

Default bind: localhost:9190 (local access only)
For Kubernetes: Bind to 0.0.0.0:9190 (pod IP accessible to operator)
Authentication: Optional (TLS client certificates, bearer tokens)
Recommendations:
- Bare metal: Keep localhost binding, use local access controls
- Kubernetes: Use NetworkPolicy to restrict operator→proxy traffic
- Production: Consider mTLS for operator↔proxy communication

Trigger Mechanisms (How to Call HTTP Endpoints)

Option 1: Direct HTTP (Kubernetes Operator)

Operator detects ConfigMap change
    ↓
POST /admin/config/validate to management Service
    ↓
POST /admin/config/reload to all pod IPs

✅ Native Kubernetes integration
✅ Immediate feedback via HTTP responses
✅ No filesystem coupling

Option 2: File Watcher (Bare Metal)

Sidecar process watches config file
    ↓
On file change → POST to localhost:9190/admin/config/validate
    ↓
If valid → POST to localhost:9190/admin/config/reload

Sidecar options:

Shell script: Simple inotifywait wrapper

inotifywait -e modify /etc/kroxylicious/config.yaml | while read; do
  curl -X POST http://localhost:9190/admin/config/validate --data-binary @/etc/kroxylicious/config.yaml
  if [ $? -eq 0 ]; then
    curl -X POST http://localhost:9190/admin/config/reload --data-binary @/etc/kroxylicious/config.yaml
  fi
done

Go binary: More robust error handling, retry logic
In-process Java: WatchService (if proxy can write to filesystem for persistence)

✅ Familiar workflow for bare metal users
✅ Decoupled from proxy (sidecar can be restarted independently)
✅ Uses same HTTP endpoints as Kubernetes

This means:

HTTP endpoints are the primitive (required)
File watching is optional convenience (can be added later)
Both deployment models use same tested, validated endpoints
Validation catches config errors before any cluster goes down

Note: This is a significant change from the current design proposal, which focuses on file watch without a validation phase. If the community prefers file watch as the primary mechanism, we should address the challenges listed above (read-only filesystem, feedback mechanisms, etc.) in the design.

Cluster Modification Semantics

The design's remove→add pattern is architecturally necessary:

The proxy's channel state machine has a fundamental constraint: each frontend channel (client→proxy) has a 1:1 relationship with a backend channel (proxy→broker). There's no mechanism to redirect an existing backend connection without closing the frontend connection.

This means:

Any cluster modification requires draining connections (1-30 seconds downtime per cluster)
"Atomic swap" approaches don't eliminate downtime—they would require hot-swapping filters in the Netty pipeline, which introduces filter state management complexity
The remove→add pattern is the correct architectural choice, not a limitation to be overcome

Implication for design: Document that cluster modifications incur brief downtime (1-30s) and this is by design, not a quality issue.

Rollback Strategy (Needs Discussion)

Current POC behavior: Rollback ALL clusters on ANY failure (all-or-nothing semantics)

This is a critical design decision that requires community consensus. The choice affects operational complexity, user experience, and downtime characteristics. See "Questions for Design Discussion" below for detailed analysis of trade-offs.

Key question: When cluster-a succeeds but cluster-b fails, should we:

Option A: Rollback cluster-a (all-or-nothing) → simpler operations, more downtime
Option B: Keep cluster-a on new config (partial success) → less downtime, more complexity

Recommendation for design: Dedicate a section to this decision, present both options fairly, and explicitly request community feedback before proceeding.

Core Design: HTTP Endpoints with 2-Phase Commit

Validation Endpoint (Core Component)

API:

POST /admin/config/validate
Content-Type: application/yaml

{new configuration YAML}

Response (200 OK):
{
  "valid": true,
  "configVersion": "a3f5b2c19e4d"  // SHA-256 hash of config
}

Response (400 Bad Request):
{
  "valid": false,
  "errors": [
    "Filter 'record-encryption' initialization failed: KMS URL required",
    "Port conflict: 9293 used by cluster-a and cluster-b"
  ]
}

What it validates:

✅ YAML syntax and structure
✅ Filter types exist (registered via SPI)
✅ FilterFactory.initialize() succeeds (filter config valid)
✅ Port ranges internally consistent (no duplicate ports in config)

What it doesn't validate (runtime concerns):

❌ Ports actually available on the OS (might be in use)
❌ External dependencies reachable (KMS might be down during reload)
❌ Upstream Kafka cluster healthy

Why this split is acceptable:

Validation is about catching configuration errors (syntax, invalid filter config). Runtime failures (port conflicts, KMS down at reload time) are handled by rollback. We can't guarantee "config valid at 10:00am" means "will succeed at 10:02am" for external dependencies.

Implementation note: Validation should build models and initialize filters without binding ports or registering gateways. This makes validation:

Fast (no network operations)
Deterministic (same result on all pods)
Resource-light (no double-memory usage)

Reload Endpoint (Core Component)

API:

POST /admin/config/reload
Content-Type: application/yaml

{new configuration YAML}

Response (200 OK):
{
  "success": true,
  "configVersion": "a3f5b2c19e4d",
  "clustersModified": ["cluster-a", "cluster-b"]
}

Response (500 Internal Server Error):
{
  "success": false,
  "error": "Failed to modify cluster-b: filter initialization failed",
  "configVersion": "abc123"  // Rolled back to previous version
}

What it does:

Applies configuration changes (remove→add clusters as needed)
If any operation fails → rollback all changes
Returns success/failure with current config version

Configuration Options

Management endpoint binding:

# proxy-config.yaml
admin:
  host: "localhost"  # Default: localhost only (bare metal)
  # host: "0.0.0.0"  # Kubernetes: bind to pod IP
  port: 9190
  tls:  # Optional: mTLS for operator communication
    keyStore: /path/to/keystore.jks
    trustStore: /path/to/truststore.jks

Benefits of this architecture:

Catches 90% of errors before any cluster goes down (validation phase)
Clear error messages before disruption
Same HTTP endpoints for Kubernetes and bare metal
File watching is optional, can be added as sidecar later
Security: localhost by default, configurable for Kubernetes

Kubernetes Integration Patterns

Management Service

Problem: Operator creates Services for Kafka traffic (ports 9292+) but not for the management port (9190).

Proposed: Create dedicated management Service for operator↔proxy communication:

apiVersion: v1
kind: Service
metadata:
  name: my-proxy-management
spec:
  type: ClusterIP  # Internal only
  selector:
    app.kubernetes.io/instance: minimal
    app.kubernetes.io/component: proxy
  ports:
  - name: management
    port: 9190
    targetPort: 9190

Benefits:

✅ Automatic pod readiness handling (Service only routes to ready pods, returns 503 if none ready)
✅ Stable DNS endpoint (my-proxy-management.ns.svc.cluster.local)
✅ Survives pod restarts/rescheduling
✅ Follows Kubernetes best practices (Services for stable endpoints)

Usage:

Validation: POST http://my-proxy-management:9190/admin/config/validate (one pod via Service)
Reload: Iterate over pods, POST directly to pod IPs (all pods must succeed)

Recommendation: Add management Service pattern to Kubernetes deployment section of design.

Read-Only Filesystem Support

Problem: Kubernetes deployments use securityContext.readOnlyRootFilesystem: true as security best practice. Current design persists config to disk after successful reload, which fails with read-only filesystem.

Proposed: Make config file persistence optional:

Deployment models:

Bare metal: Config file on disk, persist on successful reload
Kubernetes: Config in ConfigMap (operator-managed), no disk persistence

Recommendation: Document read-only filesystem support as a requirement for Kubernetes deployments.

Checksum-Based Change Detection

Problem: Operator needs to detect "config actually changed" vs "CRD reconciliation loop with no real change."

Proposed: Store SHA-256 hash of config YAML in KafkaProxy annotation:

apiVersion: kroxylicious.io/v1alpha1
kind: KafkaProxy
metadata:
  name: minimal
  annotations:
    kroxylicious.io/config-checksum: "a3f5b2c19e4d"  # SHA-256 hash
spec:
  # ... config ...

Operator logic:

String newChecksum = sha256(generateYaml(kafkaProxy));
String oldChecksum = kafkaProxy.getMetadata().getAnnotations().get("kroxylicious.io/config-checksum");

if (newChecksum.equals(oldChecksum)) {
    LOGGER.debug("Config unchanged, skipping reload");
    return;  // No-op, avoid unnecessary reload
}

// Config changed, trigger 2-phase reload
ValidationResult validation = validateViaManagementService(yaml);
if (validation.valid()) {
    reloadAllPods(yaml);
    kafkaProxy.getMetadata().getAnnotations().put("kroxylicious.io/config-checksum", newChecksum);
}

Benefits:

✅ Automatic no-op detection (reconciliation loop doesn't trigger unnecessary reloads)
✅ Rollback detection (reverting config doesn't reload if already at that state)
✅ O(1) comparison vs deep config diff

Recommendation: Add checksum-based change detection to operator integration section.

Additional Design Components

Configurable Drain Timeout

Problem: Hard-coded 30-second drain timeout is too short for Kafka consumers with long poll timeouts (default 5 minutes).

Proposed:

# proxy-config.yaml
admin:
  drainTimeoutSeconds: 300  # 5 minutes for graceful connection drain

Trade-off: Longer timeouts mean longer reload times, but fewer disrupted clients.

Recommendation: Add configurable drain timeout to design.

Observability and Status Reporting

Configuration Status Endpoint:

Separate configuration status from health checks (health is for liveness/readiness):

GET /admin/config/status
{
  "currentConfigVersion": "sha256:a3f5b2c19e4d...",
  "appliedAt": "2026-01-28T10:15:30Z",
  "lastReloadAttempt": {
    "timestamp": "2026-01-28T10:15:30Z",
    "status": "SUCCESS",
    "requestedVersion": "sha256:a3f5b2c19e4d...",
    "durationMs": 1234,
    "clustersModified": ["cluster-a"]
  },
  "lastValidationAttempt": {
    "timestamp": "2026-01-28T10:15:25Z",
    "status": "SUCCESS",
    "requestedVersion": "sha256:a3f5b2c19e4d..."
  }
}

// After reload failure with rollback failure:
{
  "currentConfigVersion": "sha256:abc123...",  // Previous version still running
  "appliedAt": "2026-01-28T09:00:00Z",
  "lastReloadAttempt": {
    "timestamp": "2026-01-28T10:20:00Z",
    "status": "ROLLBACK_PARTIAL_FAILURE",
    "requestedVersion": "sha256:newversion...",
    "rollbackState": {
      "successful": ["cluster-a"],
      "failed": {
        "cluster-b": "Failed to re-register gateway: port 9293 in use"
      }
    }
  }
}

Health endpoint stays focused on proxy health:

GET /admin/health
{
  "status": "UP",
  "checks": {
    "netty": "UP",
    "virtualClusters": "UP"
  }
}

Benefit: Clean separation - operators query /admin/config/status for reload state, /admin/health for liveness/readiness.

Recommendation: Add dedicated config status endpoint to design.

Metrics:

kroxylicious_config_reload_total{result="success|failure"} counter
kroxylicious_config_reload_duration_seconds histogram
kroxylicious_config_version_info{version="a3f5b2c19e4d"} gauge

Use cases:

Alerting on reload failures
Tracking reload duration trends
Capacity planning (reload frequency)

Recommendation: Add metrics to observability section.

Error Handling and Recovery

Rollback Failure Handling:

Current design: Log "CRITICAL: system may be in inconsistent state"

Proposed: Track rollback state and expose via health endpoint (see above).

Recovery path:

Query /admin/health to see which clusters failed rollback
Manual intervention:
- Verify cluster state (is port bound? filter initialized?)
- Either retry reload or manually fix state
Operator automation (future):
- Detect rollback failure from health endpoint
- Attempt recovery (remove failed cluster, re-add from old config)

Recommendation: Document rollback failure recovery procedures.

Concurrent Reload Prevention:

Only one reload at a time (enforced via lock)
Concurrent requests fail fast with 409 Conflict

POST /admin/config/reload
{new config}

Response (409 Conflict):
{
  "error": "Reload already in progress",
  "inProgressSince": "2026-01-28T10:15:30Z"
}

Recommendation: Document concurrency model in API specification.

Design Document Structure

Suggest organizing the design document as follows. Note: This structure assumes the HTTP-first approach described above. If the community prefers the file watch approach, the structure would need to adjust accordingly (swap "HTTP Endpoints" with "File Watch" as primary, etc.).

1. Goals and Non-Goals

Goals:

Zero-restart configuration updates
Universal deployment model (bare metal, Kubernetes)
Operator-friendly integration
Clear error handling and rollback

Non-Goals:

Zero-downtime modification (brief downtime per cluster is acceptable)
Hot-swapping filters in active connections
Partial success / continue-on-failure

2. Architecture

2.1 Core: HTTP Management Endpoints

Required endpoints:

POST /admin/config/validate - Validate config without applying
POST /admin/config/reload - Apply validated config
GET /admin/config/status - Current config version, last operation status
GET /admin/health - Proxy health (liveness/readiness)

Security:

Default bind: localhost:9190 (bare metal)
Kubernetes bind: 0.0.0.0:9190 (pod IP)
Optional TLS/mTLS for authentication
NetworkPolicy to restrict access in Kubernetes

2.2 Trigger Mechanisms (Optional)

Direct HTTP (Kubernetes):

Operator calls endpoints directly
No file watching needed

File Watcher Sidecar (Bare Metal):

Separate process watches config file
Calls HTTP endpoints on change
Options: shell script, Go binary, Java WatchService
Decoupled from proxy process

2.3 Reload Mechanism

Remove→add pattern (architecturally necessary)
Sequential processing (simplicity > parallelism)
All-or-nothing rollback (operational simplicity - needs discussion)

2.4 Validation Strategy

Build models + initialize filters without port binding
Deterministic (same result on all pods)
Catches config errors, not runtime failures

3. Deployment Patterns

3.1 Bare Metal

HTTP endpoints on localhost:9190
Config file on disk (optional)
Persist config to disk on success (if writable filesystem)

3.2 Kubernetes with Operator

HTTP endpoints on 0.0.0.0:9190
Config in ConfigMap (operator-managed)
Management Service for validation (exposes port 9190)
Checksum-based change detection (avoid no-op reloads)
2-phase commit (validate via Service → reload all pods)
Read-only filesystem support (no disk persistence)
- Sidecar file watcher (optional) → calls HTTP endpoints

4. Failure Modes and Recovery

Filter initialization failure → rollback
Port binding failure → rollback
Rollback failure → tracked state, manual recovery
Concurrent reload → fail fast with 409

5. Observability

Logging throughout reload process
Metrics for reload operations

6. Future Enhancements

Granular endpoints (/reload/cluster/{name})
Canary rollout strategies
Blue-green at pod level (operator)

Questions for Design Discussion

Should FilterFactory.initialize() be documented as validation-safe?
- Must be idempotent (can be called multiple times)?
- Should avoid side effects (don't connect to external services)?
- Or allow filter authors to decide (validation calls real KMS if they want)?

Rollback Strategy: All-or-Nothing vs Partial Success (Critical Design Decision)

This requires community consensus before proceeding.

Scenario: Config change affects cluster-a, cluster-b, cluster-c

cluster-a: modify succeeds ✅ (downtime: 2s)
cluster-b: modify fails ❌ (downtime: 30s)
cluster-c: modify succeeds ✅ (downtime: 2s)

Option A: All-or-Nothing (Current POC)

Result: Rollback cluster-a and cluster-c
Final state: All clusters on OLD config
Total downtime: cluster-a (4s), cluster-b (30s), cluster-c (4s)

Pros:

✅ Single source of truth (config file intent OR previous state, never mixed)
✅ Predictable retry path (fix issue → retry → all move together)
✅ No configuration drift (never "cluster-a on v2, cluster-b on v1")
✅ Simple status model (one config version for entire proxy)
✅ Follows declarative configuration philosophy (Kubernetes/GitOps)

Cons:

❌ Unnecessary downtime for successful clusters during rollback
❌ Wastes successful work (cluster-a, cluster-c succeeded but rolled back)

Option B: Partial Success / Continue-on-Failure

Result: Keep cluster-a and cluster-c on new config
Final state: cluster-a (NEW), cluster-b (OLD), cluster-c (NEW)
Total downtime: cluster-a (2s), cluster-b (30s), cluster-c (2s)

Pros:

✅ Less total downtime (no rollback for successful clusters)
✅ Preserves successful work

Cons:

❌ Configuration drift (reality doesn't match declared intent)
❌ Complex status model (per-cluster versions: {a: "v2", b: "v1", c: "v2"})
❌ Unclear retry path (should cluster-a reload again? How does operator know?)
❌ Reconciliation complexity (which clusters already on target version?)
❌ Requires granular reload endpoints (/reload/cluster/{name})
❌ Confusing user experience ("Reload failed" but some clusters succeeded?)

Operational Comparison:

Aspect	All-or-Nothing	Partial Success
Source of truth	Config OR previous state (clear)	Mixed state (confusing)
Retry after fixing cluster-b	Simple (reload all)	Complex (skip a,c or reload?)
Status API	One version	Per-cluster versions
Downtime on failure	Higher (rollback)	Lower (no rollback)
Operator logic	Simple	Complex reconciliation
User understanding	Clear	Confusing

User Experience Example:

All-or-Nothing:

$ kubectl apply -f new-config.yaml
Error: Config reload failed on cluster-b (filter init error)
Status: All clusters on version abc123 (previous config)
Action: Fix cluster-b config, retry apply

Partial Success:

$ kubectl apply -f new-config.yaml
Error: Config reload failed on cluster-b (filter init error)
Status: cluster-a (def456), cluster-b (abc123), cluster-c (def456)
Question: Should I retry? Will cluster-a reload again?

Questions for the community:

Which operational model do users prefer?
Is configuration drift acceptable as a trade-off for less downtime?
Should this be configurable, or should we pick one approach?

If configurable:

admin:
  rollbackStrategy: ALL  # Default? Or FAILED_ONLY?

Do we need granular reload endpoints regardless of rollback strategy?

Calude's recommendation: Start with all-or-nothing (simpler, matches declarative config philosophy), gather operational feedback, add partial success later if users request it. But this needs community buy-in, not just maintainer decision.

Should we define granular reload endpoints now or defer?
- POST /admin/config/reload (full config, current)
- POST /admin/config/reload/cluster/{name} (single cluster, future?)
What should config version format be?
- SHA-256 hash (deterministic, no clock dependency)
- Timestamp-based (easier for humans to understand)
- Operator-provided (e.g., ConfigMap resourceVersion)

Summary

The configuration reload design addresses a critical operational need. This feedback proposes HTTP endpoints with 2-phase commit (validate → reload) as the primary interface (alternative to the current file watch proposal) for the following reasons:

Why HTTP-first with validation:

Better Kubernetes integration (operator-friendly, read-only filesystem compatible)
Clear observability (HTTP responses vs file watch with no feedback)
Testability (programmatic testing vs file system manipulation)
Validation catches config errors before any cluster goes down
File watching can still be supported as a convenience layer that calls HTTP internally

Core components proposed:

POST /admin/config/validate - Validates config without applying (deterministic, fast)
POST /admin/config/reload - Applies validated config (with rollback on failure)
Management Service - Kubernetes Service exposing port 9190 for operator access
Checksum-based change detection - Avoid unnecessary reloads on no-op reconciliation
Read-only filesystem support - Make disk persistence optional for Kubernetes

Key takeaway: The architectural constraints (channel state machine, draining requirement) mean the design correctly accepts brief downtime per cluster modification. This is not a limitation—it's the right trade-off for operational simplicity and safety.

Recommended next steps:

Discuss HTTP vs file watch as primary mechanism - This is a fundamental design choice that needs community input
Discuss rollback strategy - All-or-nothing vs partial success requires consensus
Add validation endpoint and 2-phase commit to design
Add Kubernetes integration patterns (management Service, checksum-based change detection)
Document failure modes and recovery procedures
Refine POC implementation (PR#3176) based on finalized design

Excellent work on the POC—it provides a solid foundation for whichever trigger mechanism the community prefers!

Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com>

hot reload feature proposal

f8a3e53

Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com>

Uzziee requested a review from a team as a code owner November 18, 2025 06:50

Uzziee mentioned this pull request Nov 18, 2025

Proposal to add dynamic reload feature in kroxylicious kroxylicious/kroxylicious#2900

Open

SamBarker reviewed Jan 28, 2026

View reviewed changes

replace file watcher with HTTP based approach

2a9747e

Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

012 - Hot reload feature proposal #83

012 - Hot reload feature proposal #83

Uh oh!

Uzziee commented Nov 18, 2025

Uh oh!

SamBarker left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

012 - Hot reload feature proposal #83

Are you sure you want to change the base?

012 - Hot reload feature proposal #83

Uh oh!

Conversation

Uzziee commented Nov 18, 2025

Uh oh!

SamBarker left a comment

Choose a reason for hiding this comment

Design PR#83 Feedback - Configuration Reload Design

Executive Summary

Proposed Change to Design: HTTP Endpoints as Primary Interface

Current Design Proposal

Recommended Alternative: HTTP-First Approach

Core: HTTP Management Endpoints

Trigger Mechanisms (How to Call HTTP Endpoints)

Cluster Modification Semantics

Rollback Strategy (Needs Discussion)

Core Design: HTTP Endpoints with 2-Phase Commit

Validation Endpoint (Core Component)

Reload Endpoint (Core Component)

Configuration Options

Kubernetes Integration Patterns

Management Service

Read-Only Filesystem Support

Checksum-Based Change Detection

Additional Design Components

Configurable Drain Timeout

Observability and Status Reporting

Error Handling and Recovery

Design Document Structure

1. Goals and Non-Goals

2. Architecture

2.1 Core: HTTP Management Endpoints

2.2 Trigger Mechanisms (Optional)

2.3 Reload Mechanism

2.4 Validation Strategy

3. Deployment Patterns

3.1 Bare Metal

3.2 Kubernetes with Operator

4. Failure Modes and Recovery

5. Observability

6. Future Enhancements

Questions for Design Discussion

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants