Skip to content

[RFD]: Adoption of Event Sourcing and CQRS Architecture #118

@atifsyedali

Description

@atifsyedali

Decision Goal

Determine if OpenCHAMI should adopt Event Sourcing (ES) and Command Query Responsibility Segregation (CQRS) architectural patterns to address current limitations in system auditability, observability, and integration.

Category

Architecture

Stakeholders / Affected Areas

development teams

Decision Needed By

No response

Problem Statement

1. Problem Statement

Current management approaches in OpenCHAMI currently rely heavily on:

  • Mutable state stores that overwrite historical information
  • Polling-based integration that limits real-time responsiveness
  • Ad-hoc event handling without systematic audit trails
  • Lack of causation and correlation data that can help identify and troubleshoot issues

1.1. Component Overview

OpenCHAMI consists of several key components working together to provide HPC system management:

1.1.1. SMD (State Management Database)

  • Central repository for hardware state and inventory
  • RESTful API for component management
  • Discovery coordination and state tracking
  • Group and partition management

1.1.2. Magellan

  • Redfish-based BMC discovery tool
  • Network scanning and component enumeration
  • Integration with SMD for inventory updates
  • Flexible discovery methods for heterogeneous systems

1.1.3. ochami CLI

  • Command-line interface for OpenCHAMI services
  • Primary user interface for system administration
  • Integration with SMD and BSS APIs
  • Configuration and operational management

1.1.4. coresmd

  • CoreDHCP plugins for DHCP integration
  • SMD-based DHCP lease management
  • Bootloop plugin for unknown device handling
  • Integration between network and inventory services

1.1.5. BSS (Boot Script Service)

  • Boot parameter and configuration management
  • Static image and Level 2 boot services
  • Integration with SMD for node information

1.2. Current Data Flows and Integration Patterns

1.2.1. Discovery Workflow

1. Magellan scans networks and discovers BMCs
2. Magellan queries Redfish endpoints for inventory
3. Magellan updates SMD with discovered components
4. SMD processes updates and maintains state
5. Other services query SMD for current state

1.2.2. State Management

  • SMD maintains mutable state in PostgreSQL database
  • Component state changes tracked with limited history
  • State Change Notifications (SCN) for basic event distribution
  • Polling-based integration for most consumers

1.2.3. Boot Process

1. coresmd provides DHCP leases based on SMD data
2. BSS provides boot parameters based on SMD state
3. ochami CLI provides administrative interface

1.3. Current Capabilities and Limitations Analysis

This section analyzes OpenCHAMI's current event-like patterns and identifies specific gaps that ES/CQRS could address, with concrete references to the existing codebase.

1.3.1. Auditability: Hardware History Events

Current Capabilities:
OpenCHAMI's SMD already tracks hardware history through the hwinv_hist table with these event types:

// From smd/pkg/sm/hwinvhist.go
const (
    HWInvHistEventTypeAdded    = "Added"
    HWInvHistEventTypeRemoved  = "Removed"
    HWInvHistEventTypeScanned  = "Scanned"
    HWInvHistEventTypeDetected = "Detected"
)

The database schema captures basic historical data:

-- From smd/migrations/postgres/9_create_version7.up.sql
create table if not exists hwinv_hist (
    "id"         VARCHAR(63),        -- Component xname
    "fru_id"     VARCHAR(128),       -- FRU identifier
    "event_type" VARCHAR(128),       -- Event type
    "timestamp"  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP
);

Limitations and Gaps:

  • No actor/causation tracking information: No record of who or what service caused the change
  • Missing correlation IDs: Cannot link related events across services
  • Limited event types: Only 4 basic FRU lifecycle events, missing operational events (e.g. maintenance)
  • Pruning issues: Database requires periodic pruning of "duplicate" events (not sure why)
  • No temporal queries: Cannot reconstruct system state at arbitrary points in time

1.3.2. State Management: State Change Notifications (SCN)

Current Capabilities:
SMD publishes state change notifications through HMNFD with this payload structure:

// From pkg/sm/subscriptions.go
type SCNPayload struct {
    Components     []string `json:"Components"`
    Enabled        *bool    `json:"Enabled,omitempty"`
    Flag           string   `json:"Flag,omitempty"`
    Role           string   `json:"Role,omitempty"`
    SubRole        string   `json:"SubRole,omitempty"`
    SoftwareStatus string   `json:"SoftwareStatus,omitempty"`
    State          string   `json:"State,omitempty"`
}

State change events include these categories:

// From pkg/sm/events.go
const (
    NodeStateChange       SMEventType = "NodeStateChange"
    StateChange           SMEventType = "StateChange"
    RedfishEndpointChange SMEventType = "RedfishEndpointChange"
    HWInventoryChange     SMEventType = "HWInventoryChange"
)

Limitations and Gaps:

  • No previous state: SCN payloads don't include the previous state, making it impossible to understand transitions
  • No event ordering: No sequence numbers or causation chains
  • Limited context: No information about what caused the state change
  • Fire-and-forget delivery: No guaranteed delivery or replay capabilities
  • Inconsistent subscribers: Services may miss events during downtime, leading to state inconsistencies

1.3.3. Integration: Current Polling Patterns

Current Capabilities:
Despite having SCN infrastructure, most OpenCHAMI services still use polling patterns:

  1. Magellan → SMD: Discovery service polls Redfish endpoints and pushes inventory updates
  2. BSS → SMD: Boot Script Service polls SMD for current component state
  3. ochami CLI → SMD: CLI tools make synchronous API calls for current state

Limitations and Gaps:

  • High latency: Polling intervals create 30-60 second delays for state propagation
  • Resource overhead: Constant polling creates unnecessary load on SMD database
  • Race conditions: Polling-based integration can miss rapid state changes
  • Complex error handling: Each service must implement retry logic and handle stale data

Analytics: Limited Operational Insights

Current Capabilities:
The existing event infrastructure provides basic building blocks:

  • Hardware history events for FRU tracking
  • State change notifications for real-time updates

Limitations and Gaps:

  • No event replay: Cannot analyze historical patterns or debug past issues
  • Limited aggregation: No built-in support for trend analysis or pattern detection
  • Missing business events: Only low-level hardware events, missing higher-level operational events
  • No cross-service correlation: Cannot trace workflows that span multiple services

Proposed Solution

2. Event Sourcing: Immutable State Through Events

Event Sourcing is an architectural pattern where all changes to application state are stored as a sequence of immutable events. Instead of overwriting data, each state transition is recorded, enabling full auditability and the ability to reconstruct past states at any point in time.

2.1. Core Concepts

Traditional State Storage:
Component X1234 → Status: "Online", Last_Update: 2024-08-25T10:30:00Z

Event Sourcing:
Event 1: ComponentDiscovered(X1234, timestamp: 2024-08-25T09:00:00Z)
Event 2: ComponentPoweredOn(X1234, timestamp: 2024-08-25T09:15:00Z)  
Event 3: ComponentStatusChanged(X1234, status: "Online", timestamp: 2024-08-25T10:30:00Z)

Current State = Apply(Event1) → Apply(Event2) → Apply(Event3)

2.2. Event Store Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Event Store                              │
├─────────────────────────────────────────────────────────────────┤
│ Stream: component-x1234                                         │
│ ┌─────┬─────────────────────┬─────────────┬───────────────────┐ │
│ │ Seq │ Event Type          │ Timestamp   │ Data              │ │
│ ├─────┼─────────────────────┼─────────────┼───────────────────┤ │
│ │  1  │ ComponentDiscovered │ 09:00:00Z   │ {id: X1234, ...}  │ │
│ │  2  │ ComponentPoweredOn  │ 09:15:00Z   │ {id: X1234, ...}  │ │
│ │  3  │ StatusChanged       │ 10:30:00Z   │ {status: "Online"}│ │
│ └─────┴─────────────────────┴─────────────┴───────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

2.3. Benefits

  1. Complete Audit Trail: Every state change is recorded with full context, answering "what happened, when, and why"
  2. Temporal Queries: Reconstruct system state at any point: "What was the cluster state at 2PM yesterday?"
  3. Debugging: Replay events to understand how the system reached a particular state
  4. Integration: Other services subscribe to event streams for real-time updates
  5. Analytics: Event streams provide rich data for trend analysis and predictive modeling

3. CQRS: Optimized Read and Write Models

Command Query Responsibility Segregation separates the responsibilities of reading and writing data. Commands modify state through business logic and emit events, while queries retrieve data from optimized read models (projections).

CQRS Architecture Diagram:

                    CQRS System Architecture
                           
Command Side                    Event Store                    Query Side
┌──────────────┐               ┌─────────────┐               ┌─────────────┐
│   Commands   │──────────────▶│   Events    │──────────────▶│ Projections │
│              │               │   Stream    │               │             │
│ - PowerOn    │               │             │               │ - Component │
│ - Discover   │               │ Event1 ────▶│               │   Status    │
│ - Update     │               │ Event2 ────▶│               │ - Inventory │
│              │               │ Event3 ────▶│               │ - History   │
└──────────────┘               └─────────────┘               └─────────────┘
       │                              │                             │
       │                              │                             │
       ▼                              ▼                             ▼
┌──────────────┐               ┌─────────────┐               ┌─────────────┐
│   Business   │               │   Append    │               │    Query    │
│    Logic     │               │    Only     │               │   Handlers  │
│              │               │             │               │             │
│ - Validation │               │ Immutable   │               │ - Fast      │
│ - Rules      │               │ Ordered     │               │ - Cached    │
│ - Events     │               │ Versioned   │               │ - Indexed   │
└──────────────┘               └─────────────┘               └─────────────┘

2.3. How Benefits Are Achieved

  1. Independent Scaling: Write operations can be optimized for consistency and validation, while read operations scale independently
  2. Query Optimization: Multiple read models can be created for different access patterns (e.g., status dashboard, historical reports, search)
  3. Technology Flexibility: Different storage technologies can be used for command vs. query sides
  4. Operational Resilience: Read side outages don't affect write operations and vice versa

3.1. Projections: Materialized Views from Events

Projections are materialized views built by processing event streams. They transform the event history into optimized data structures for specific query patterns.

Projection Processing:

Event Stream                 Projection Builder              Materialized View
┌───────────────────┐       ┌──────────────────┐           ┌─────────────────┐
│ComponentPoweredOn │       │                  │           │   Component     │
│  id: X1234        |──────▶│  Event Handler   │──────────▶│   Status View   │
│  timestamp: ...   │       │                  │           │                 │
└───────────────────┘       │ 1. Read Event    │           │ X1234: Online   │
                            │ 2. Update View   │           │ X1235: Offline  │
┌──────────────────┐        │ 3. Store Result  │           │ X1236: Maint    │
│StatusChanged.    │        │                  │           │                 │
│  id: X1234       |───────▶│                  │           │ Last Updated:   │
│  status: Online  │        │                  │           │ 10:30:00Z       │
└──────────────────┘        └──────────────────┘           └─────────────────┘

3.3. Types of Projections

  1. Current State: Latest status of all components for operational dashboards
  2. Historical Views: Time-series data for trend analysis and reporting
  3. Aggregated Views: Summaries like "cluster health percentage" or "nodes by status"
  4. Search Indexes: Full-text search across component metadata and events
  5. Analytics: Complex derived data for machine learning and predictions

3.4. Challenges and Drawbacks

3.4.1. Performance Challenges

  1. High-Volume Event Processing

    • Problem: If we decide to use this architecture for sources that generate lots of events, such as telemetry, then scale becomes an issue for storage and projections.
    • Impact: Projection lag, memory pressure, CPU bottlenecks
    • Mitigation: Use various techniques and technologies (e.g., batching, load balancing with stream processing frameworks like Apache Kafka)
  2. Projection Rebuild Performance

    • Problem: Rebuilding projections from event history can take hours/days
    • Impact: Downtime, resource consumption, operational complexity
    • Mitigation: Use snapshotting and incremental updates to reduce processing load
  3. Query Performance

    • Problem: Complex temporal queries still require event replay
    • Impact: Slow response times for historical analysis
    • Mitigation: Load event data into OLAP databases like DuckDB.

3.4.2. Consistency Challenges

  1. Eventual Consistency

    • Problem: Read models lag behind write operations
    • Impact: Stale data in dashboards, potential race conditions
    • Mitigation: Clear UI indicators, retry logic, acceptable SLAs
  2. Projection Synchronization

    • Problem: Multiple projections may be at different event positions
    • Impact: Inconsistent views across different screens/reports
    • Mitigation: Consistent versioning across projections

Operational Challenges:

  1. Storage Growth

    • Problem: Event streams grow indefinitely
    • Impact: Storage costs, backup/recovery time, query performance
    • Mitigation: Implement data retention policies, where older events are rolled up into base snapshots
  2. Schema Evolution

    • Problem: Event schemas need to evolve over time
    • Impact: Backward compatibility, projection rebuilds
    • Mitigation: Use versioned event schemas

3.5. Mitigation Strategies

3.5.1. Snapshots for Performance

Event Stream with Snapshots:

Events: 1 ───▶ 2 ───▶ 3 ───▶ ... ───▶ 10000 ───▶ 10001 ───▶ 10002
              └─ Snapshot @100   └─ Snapshot @10000

To rebuild latest projection:

  1. Load latest snapshot (state at event 10000)
  2. Replay only events 10001-10002 (2 events vs 10002 events)
  3. Result: 99.98% reduction in processing time

How often you need to take snapshots depends on your system's event volume and performance requirements. A common strategy is to snapshot after a fixed number of events (e.g., every 1000 events) or at regular time intervals (e.g., every hour).

3.5.2. Projection Performance Optimization

  1. Caching: In-memory caching systems for caching latest snapshots
  2. Partitioning: Shard projections by cluster/rack/time ranges
  3. Async Processing: Background workers with queue management

4. ES/CQRS Application Strategy for OpenCHAMI

4.1. Overall Architecture Vision

The proposed ES/CQRS transformation for OpenCHAMI introduces a centralized event store that captures all system changes, enabling rich operational insights while maintaining backward compatibility with existing APIs.

4.1.1. High-Level Architecture

                OpenCHAMI ES/CQRS Architecture
                              
     Command Side                Event Store               Query Side
┌─────────────────────┐      ┌─────────────────┐     ┌─────────────────────┐
│                     │      │                 │     │                     │
│   Existing APIs     │      │   Event Streams │     │    Projections      │
│                     │      │                 │     │                     │
│ ┌─────────────────┐ │      │ Component Events│────▶│ ┌─────────────────┐ │
│ │ SMD REST API    │ │─────▶│ Discovery Events│     │ │ Current State   │ │
│ │ (Commands)      │ │      │ Power Events    │────▶│ │ Component View  │ │
│ └─────────────────┘ │      │ Config Events   │     │ └─────────────────┘ │
│                     │      │ Job Events      │     │                     │
│ ┌─────────────────┐ │      │ Error Events    │────▶│ ┌─────────────────┐ │
│ │ Magellan        │ │─────▶│                 │     │ │ Historical      │ │
│ │ Discovery       │ │      │                 │     │ │ Analytics View  │ │
│ └─────────────────┘ │      │                 │────▶│ └─────────────────┘ │
│                     │      │                 │     │                     │
│ ┌─────────────────┐ │      │                 │     │ ┌─────────────────┐ │
│ │ BSS Boot        │ │─────▶│                 │────▶│ │ Operational     │ │
│ │ Configuration   │ │      │                 │     │ │ Dashboard View  │ │
│ └─────────────────┘ │      │                 │     │ └─────────────────┘ │
└─────────────────────┘      └─────────────────┘     └─────────────────────┘
           │                            │                         │
           │                            │                         │
           ▼                            ▼                         ▼
┌─────────────────────┐      ┌─────────────────┐     ┌─────────────────────┐
│ Business Logic      │      │ Event Messaging │     │ Query Handlers      │
│ - Validation        │      │ - Durable       │     │ - Relational DB     │
│ - State Machines    │      │ - Ordered       │     │ - Time-series DB    │
│ - Event Generation  │      │ - Replicated    │     │ - Cache Layer       │
└─────────────────────┘      └─────────────────┘     └─────────────────────┘

4.1.2. Key Design Principles

  1. Backward Compatibility: Existing REST APIs remain unchanged; ES/CQRS operates transparently behind them
  2. Incremental Adoption: Services can adopt ES/CQRS patterns individually without coordinated changes
  3. Event-First Design: All state changes generate events first, then update projections
  4. Operational Observability: Complete audit trails and real-time monitoring through event streams

4.2. Service-by-Service Transformation Strategy

4.2.1. SMD (State Management Database) - Foundation Service

Current State:

  • Central mutable database with limited history
  • Basic SCN (State Change Notification) infrastructure
  • Polling-based integration with other services

ES/CQRS Transformation:

SMD Component Events:
┌─────────────────────────────────────────────────────────────────┐
│ Event Stream: smd.components                                    │
├─────────────────────────────────────────────────────────────────┤
│ ● ComponentDiscovered(xname, type, location, endpoints)         │
│ ● ComponentStateChanged(xname, from, to, reason, actor)         │
│ ● ComponentEnabled(xname, actor, timestamp)                     │
│ ● ComponentDisabled(xname, reason, actor, timestamp)            │
│ ● ComponentRoleChanged(xname, from, to, actor)                  │
│ ● ComponentGroupAssigned(xname, group, actor)                   │
│ ● RedfishEndpointUpdated(xname, endpoint, credentials)          │
│ ● InventoryUpdated(xname, hwinfo, fru_id, actor)               │
└─────────────────────────────────────────────────────────────────┘

Implementation Strategy:

  1. Event Store Setup: Deploy a durable event messaging system as the event backbone
  2. Event Generation: Modify SMD operations to emit events before database updates
  3. Projection Building: Create current-state projections from event streams
  4. API Compatibility: Existing REST endpoints query projections instead of database
  5. Enhanced SCN: Replace basic SCN with rich event streams via HMNFD

Benefits Achieved:

  • Complete component lifecycle audit trails
  • Real-time state change notifications to all subscribers
  • Temporal queries: "Show cluster state at any point in time"
  • Cross-correlation with discovery, power, and job events

4.2.2. Magellan - Discovery and Inventory Service

Current State:

  • Network scanning and BMC discovery
  • Direct SMD API calls for inventory updates
  • Limited coordination with other discovery methods

ES/CQRS Transformation:

Magellan Discovery Events:
┌─────────────────────────────────────────────────────────────────┐
│ Event Stream: magellan.discovery                                │
├─────────────────────────────────────────────────────────────────┤
│ ● DiscoveryScanStarted(scan_id, networks, method, actor)        │
│ ● BMCEndpointFound(scan_id, ip, mac, manufacturer)              │
│ ● RedfishValidated(scan_id, endpoint, version, capabilities)    │
│ ● ComponentInventoried(scan_id, xname, hwinfo, fru_data)        │
│ ● DiscoveryCompleted(scan_id, found_count, errors, duration)    │
│ ● InventoryConflictDetected(xname, existing, discovered)        │
└─────────────────────────────────────────────────────────────────┘

Integration Benefits:

  • Event-Driven Updates: Magellan emits discovery events; SMD subscribes and updates components
  • Conflict Resolution: Detect when multiple discovery methods find conflicting information
  • Discovery Analytics: Track discovery performance, success rates, and network coverage
  • Coordinated Discovery: Multiple discovery services can coordinate through event streams

4.2.3. BSS - Boot Script Service

Current State:

  • Boot parameter and configuration management
  • Polling SMD for node information
  • Static boot configurations

ES/CQRS Transformation:

BSS Boot Events:
┌─────────────────────────────────────────────────────────────────┐
│ Event Stream: bss.boot                                          │
├─────────────────────────────────────────────────────────────────┤
│ ● BootConfigurationUpdated(xname, config, version, actor)       │
│ ● BootParametersRequested(xname, boot_id, dhcp_request)         │
│ ● BootParametersProvided(xname, boot_id, params, config)        │
│ ● BootSequenceStarted(xname, boot_id, timestamp)                │
│ ● BootSequenceCompleted(xname, boot_id, result, duration)       │
│ ● BootConfigurationValidated(config_id, nodes, status)          │
└─────────────────────────────────────────────────────────────────┘

Integration Benefits:

  • Boot Orchestration: Coordinate boot sequences with power and discovery events
  • Configuration Management: Track boot configuration changes and rollback capabilities
  • Boot Analytics: Monitor boot success rates, timing, and failure patterns
  • Dynamic Configuration: Event-driven boot configuration based on discovered hardware

5. Open Source Technology Options for ES/CQRS

5.1. Event Store Technologies

NATS JetStream

  • Pros: Lightweight, subject-based routing a natural fit for component hierarchies, excellent Go ecosystem integration, built-in clustering, simple operations
  • Cons: Newer ecosystem, fewer third-party tools, limited complex query capabilities
  • Best For: Go-based microservices, cloud-native deployments, operational simplicity

Apache Kafka

  • Pros: Mature ecosystem, proven scalability, extensive tooling, strong stream processing integration
  • Cons: Complex operations, resource-intensive, Java-centric ecosystem, steep learning curve
  • Best For: Large-scale deployments, complex stream processing, polyglot environments

KurrentDB (was EventStore)

  • Pros: Purpose-built for Event Sourcing, native projections, excellent temporal query support, strong consistency
  • Cons: Smaller ecosystem, C#-centric, higher operational complexity, commercial licensing for clustering
  • Best For: Complex event sourcing scenarios, .NET environments, advanced temporal requirements

Apache Pulsar

  • Pros: Multi-tenancy, geo-replication, unified messaging model, excellent performance
  • Cons: Complex architecture, newer ecosystem, higher resource requirements
  • Best For: Multi-tenant environments, geo-distributed systems, unified pub-sub and queuing

5.2. Database Technologies for Projections

PostgreSQL

  • Pros: ACID compliance, rich indexing, JSON support, mature ecosystem, excellent performance
  • Cons: Single-node write scalability limits, complex sharding setup
  • Best For: Current state projections, complex queries, strong consistency requirements

Redis

  • Pros: Extremely fast, simple data structures, built-in persistence, clustering support
  • Cons: Memory-only primary storage, limited query capabilities, eventual consistency
  • Best For: Hot data caching, session storage, real-time dashboards

KurrentDB (was EventStore)
Covered previously

Apache Cassandra

  • Pros: Linear scalability, multi-datacenter replication, high availability, tunable consistency
  • Cons: Complex data modeling, eventual consistency, operational complexity
  • Best For: Large-scale distributed systems, high availability requirements, write-heavy workloads

5.3. Analytics and OLAP Platforms

DuckDB + Parquet

  • Pros: Embedded analytics engine, excellent columnar performance, SQL compatibility, no infrastructure overhead
  • Cons: Single-node limitations, newer ecosystem, limited real-time capabilities
  • Best For: Analytical workloads, data science, embedded analytics, cost-effective OLAP

Apache Druid

  • Pros: Real-time analytics, automatic indexing, horizontal scaling, sub-second queries
  • Cons: Complex architecture, limited flexibility, steep learning curve
  • Best For: Real-time dashboards, high-cardinality data, interactive analytics

ClickHouse

  • Pros: Extremely fast analytical queries, excellent compression, SQL support, cost-effective
  • Cons: Limited transaction support, complex operations for multi-node, eventual consistency
  • Best For: Analytical workloads, log analysis, business intelligence, cost-sensitive deployments

Alternatives Considered

Keep the current architecture of mutable database with State Change Notification infrastructure.

Other Considerations

No response

Related Docs / PRs

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions