Skip to content

[RFD]: Hardware Inventory API and Data Model Proposal #112

@bmcdonald3

Description

@bmcdonald3

Decision Goal

To agree on the core data models and API endpoints for a new Hardware Inventory contract, and to identify key areas that require further discussion before implementation.

Category

Architecture

Stakeholders / Affected Areas

Sys admins, magellan, SMD

Decision Needed By

No response

Problem Statement

OpenCHAMI currently lacks means for tracking Field Replaceable Units (FRUs) and other hardware. The existing process is focused on functional gathering (e.g., information required for power control) and relies on manual workflows and CLI tools that are not easily auditable or consumable by other services. This proposal aims to solve that problem by defining a core inventory contract and a corresponding set of APIs to establish a baseline, programmatic source of truth for all hardware in the system.

Core Data Models

The Device Model

The Device model represents any physical hardware component that is uniquely trackable within the system, covering everything from top-level systems like servers and switches down to individual FRUs such as GPUs, NICs, or DIMMs. The primary qualification for an item to be a Device is that it can be uniquely identified, typically through a combination of its manufacturer, part number, and serial number discovered via an out-of-band management interface like Redfish. The rationale here is that all items that can be individually replaced are tracked as their own distinct entities.

When storing cables, information about connections (such as when a node is connected to a switch) may be stored in the properties field. Best practice is to store this information as a connection with endpointA that lists a port and deviceID and endpointB that lists a port and deviceID to model the connection.

Core fields

  • id (UUID): The permanent, unique identifier for the hardware.
  • deviceType (Enum): The type of hardware (e.g., "Node", "GPU", "Rack").
  • manufacturer (String): The manufacturer name.
  • partNumber (String): The part number.
  • serialNumber (String): The serial number.
  • parentID (UUID): The parent device of this device. If null, this is a top-level device (i.e., a rack; dimms are children of nodes, etc.).
  • childrenDeviceIds (Array of UUIDs): A read-only list of devices contained within this one. Calculated on-request, not stored (to avoid frequent updates).

Arbitrary key-value store

  • properties (Map of strings to JSON values): An arbitrary key-value map for storing additional, non-standard attributes.
Properties information

The properties Field for Custom Attributes

To resolve the open question regarding custom attributes, a properties field will be in the Device model. This field allows storing arbitrary key-value data that is not covered by the core model fields.

The properties field is a map where keys are strings and values can be any valid JSON type (string, number, boolean, null, array, or object). To ensure consistency and usability, the following constraints and guidelines apply.

Constraints on Keys

  • all keys must be in lowercase snake_case.
  • keys may only contain lowercase alphanumeric characters (a-z, 0-9), underscores (_), and dots (.).
  • the dot character (.) is used exclusively as a namespace separator to group related attributes (e.g., bios.release_date).

Key Transformation Examples

HPCM Key OpenCHAMI Key
biosBootMode bios_boot_mode
operationalStatus operational_status
rootFs root_fs
CONSERVER_LOGGING conserver_logging
dns_domain dns_domain
Wake-up Type wake_up_type
SKU Number sku_number
bios.Release Date bios.release_date

Other constraints

  • all values stored in the properties field must be valid JSON
  • whenever possible, use simple JSON types (String, Number, Boolean)
  • only use JSON Objects or Arrays when data is inherently structured as a group or list

Example of a JSON Object value:

"aliases": {
  "a2000": "ProLiant_XL225n_Gen10_Plus_n1",
  "brazos": "brazos1",
  "product": "XL225n_Gen10_Plus_n1"
}

Example of a JSON array value:

"protocol": ["Hpe", "NO_DCMI", "NO_DCMI_NM", "None", "ipmi", "redfish"]

Metadata

  • apiVersion (String): The API group version (e.g., "inventory/v1").
  • kind (String): The resource type (e.g., "Device").
  • schemaVersion (String): The version of this resource's schema.
  • createdAt (Timestamp): Timestamp of when the device was created.
  • updatedAt (Timestamp): Timestamp of the last update.
  • deletedAt (Timestamp): Timestamp for soft deletes.
Device JSON
{
  "apiVersion": "inventory/v1",
  "kind": "Device",
  "schemaVersion": "v1",
  "id": "d5e6f7a8-b9c0-d1e2-f3g4-h5i6j7k8l9m0",
  "deviceType": "Node",
  "manufacturer": "Vendor Inc.",
  "partNumber": "VN-123-456",
  "serialNumber": "SN987654321",
  "parentID": "f7g8h9i0-j1k2-l3m4-n5o6-p7q8r9s0t1u2",
  "childrenDeviceIds": [
    "gpu-uuid-01",
    "dimm-uuid-01",
    "dimm-uuid-02"
  ],
  "properties": [
    "device_type_slug": "ProLiant-BL460c-Gen10",
    "protocol": ["Hpe", "NO_DCMI", "NO_DCMI_NM", "None", "ipmi", "redfish"]
  ]
  "createdAt": "2025-09-20T10:00:00Z",
  "updatedAt": "2025-09-29T14:30:00Z",
  "deletedAt": null
}

{
  "apiVersion": "inventory/v1",
  "kind": "Device",
  "schemaVersion": "v1",
  "id": "d5e6f7a8-b9c0-d1e2-f3g4-h5i6j7k8l9m0",
  "deviceType": "Node",
  "manufacturer": "Vendor Inc.",
  "partNumber": "VN-123-456",
  "serialNumber": "SN987654321",
  "parentID": "chassis-uuid-b01",
  "childrenDeviceIds": [
    "gpu-uuid-01",
    "dimm-uuid-01",
    "dimm-uuid-02"
  ],
  "properties": {
    "device_type_slug": "ProLiant-BL460c-Gen10",
    "protocol": ["Hpe", "NO_DCMI", "NO_DCMI_NM", "None", "ipmi", "redfish"],
    "connections": [
      {
        "endpointA": "switch-uuid-tor1",
        "port": "eth0"
      },
      {
        "endpointB": "node1-uuid",
        "port": "gbe-1/0/5"
      }
    ]
  },
  "createdAt": "2025-09-20T10:00:00Z",
  "updatedAt": "2025-09-29T14:30:00Z",
  "deletedAt": null
}

API Specification

The hardware inventory contract is exposed through three logical API groups. Clients can request a specific schema version of a resource using the Accept header.

Inventory API (/apis/inventory/v1)

Provides a RESTful interface for managing the current state of the inventory via CRUD operations.

  • Device Management
    • GET /devices: List and filter all devices.
    • POST /devices: Create a new device.
    • GET /devices/{deviceId}: Get a single device.
    • PATCH /devices/{deviceId}: Partially update a device.
    • DELETE /devices/{deviceId}: Delete a device.

Other APIs

To fully support FRU tracking snapshots and history, there will be two supporting APIs (inventory history and inventory collection), but the contents of those are outside the scope of this current RFD.

The collection API will have a scan operation to gather inventory and display what has changed since the last approved system state and the history API will store individual machine states as snapshots to provide a record of machine hardware changes.

Proposed API Groups

The proposed solution is a unified hardware inventory contract exposed via three logical API groups that work together to provide a complete inventory solution.

  • Inventory API: The system of record. Its sole responsibility is to provide a RESTful interface for managing the current state of hardware inventory.
  • History API: Manages the long-term storage of historical inventory snapshots, provides data retention capabilities, and handles comparison (diff) operations.
  • Collection API: Actively discovers hardware information, compares it to the last known state to generate a diff report, and submits approved changes to the Inventory API.

Alternatives Considered

  • Extend Magellan: Implement inventory gathering using the existing magellan CLI tool for manual management rather than building a new set of APIs. Rejected, as the goal is to provide stable APIs that other services can consume programmatically.

Other Considerations

Ongoing Design Questions

  • COMPLETE (see blelow) Endpoint-to-Location Mapping: What is the most effective and flexible mechanism for administrators to provide the mapping between discoverable hardware endpoints (e.g., BMC IPs) and their physical locations?
  • COMPLETE (see below) Custom Attributes: Should the API support storing arbitrary key-value data for devices and locations?

Changes based on discussion

  • Moved deviceTypeSlug out of top-level design. This is gathered/created fundamentally differently than hardware-specific things, such as manufacturer, part number, etc., so we decided that it should belong in the properties field, if desired.
  • Added a properties field for storing arbitrary string->JSON values.
  • Added a description of what the point is for having locations separate from devices.
  • Added a description of what all counts as a device.
  • Renamed componentType to deviceType.
  • Added geolocation to location to represent the actual physical location, which cannot be discovered by out-of-band discovery.
  • Workflow updated to reflect new, optional-geolocation model.
  • Pulled out less-relevant parts into summary sections
  • Broke down fields based on metadata vs core fields to make it more readable
  • Restructured document to make more readable
  • Removed location and connection resources
  • Added parentID to device
  • Removed locationID from device
  • Added section about modeling connections with cables properties field

Related Docs / PRs

Details
System workflow overview and Events

System Workflow Overview

The new workflow is discovery-first, automatically generating the inventory structure based on what is physically present. Specific geolocation details are treated as optional data that can be added by an administrator after the hardware has been discovered.

  1. Discovery is Initiated: A scan is collected by the Collection API using a list of discoverable hardware management endpoints (e.g., a range of BMC IP addresses).
  2. Hardware and Hierarchy are Discovered: The collection logic connects to each endpoint and discovers all hardware components and their parent-child relationships (e.g., a server containing GPUs and DIMMs).
  3. A Diff Report is Generated: The discovered hardware state is compared against the last known state in the Inventory API. For new hardware, the system proposes the creation of corresponding Device and Location resources, automatically building the location hierarchy based on the discovered parent-child relationships.
  4. Changes are Approved: An administrator reviews the structured diff report and approves the changes through the Collection API.
  5. System State is Updated: Upon approval, the Inventory API creates or updates the Device resources and their associated Location resources. A top-level component's location will have a null parentLocationId, establishing it as the root. Events are then emitted to trigger the creation of a new historical snapshot.
  6. Physical Geolocation is Optionally Added: After the hardware exists in the system, an administrator can enrich the data by updating the Location resources with specific physical details (e.g., rack, U-position) via the new geolocation field. This is a manual enrichment step, not a prerequisite for discovery.

Event Sourcing with CloudEvents

Inventory operations will generate events, the long term vision is currently to use CloudEvents, but the broader decision of OpenCHAMI event sourcing is not a part of this RFD and will be proposed after consensus is reached on the broader question of events in OpenCHAMI.

Initial draft of other APIs to provide context, though these are subject to change and will be presented as separate RFDs.

History API (/apis/history/v1)

Provides endpoints for accessing and managing historical inventory snapshots.

  • GET /snapshots: List all available snapshots.
  • GET /snapshots/{snapshotId}: Get a specific historical snapshot of the inventory.
  • GET /snapshots/diff: Compare two snapshots. (e.g., ?from={snapshotId1}&to={snapshotId2})
  • GET /events?subject={deviceId}: Get the complete change history of events for one specific device.
  • Data Retention
    • GET /policy: Get the current data retention policy.
    • PUT /policy: Set the data retention policy.
    • POST /snapshots/{snapshotId}/pin: Pin a snapshot to prevent automatic deletion.
    • DELETE /snapshots/{snapshotId}/pin: Unpin a snapshot.

Collection API (/apis/collection/v1)

Provides endpoints for initiating and managing the asynchronous hardware discovery workflow.

  • POST /scans: Trigger a new discovery scan. Returns an Operation resource.
  • POST /scans/{scanId}/approve: Approve the changes from a completed scan. Returns an Operation resource.
  • GET /scans/{scanId}/diff: Get the diff report for a completed scan. The scanId is found in the result of a successful scan Operation.
  • GET /operations/{operationId}: Get the status of a long-running operation.

The Operation Model

The Operation resource is used to track the status of any long-running asynchronous task, such as a hardware scan or the process of applying approved changes.

  • name (String): The unique, server-assigned name of the operation, which also serves as its ID (e.g., operations/scan-a4b1c2d3e4f5).
  • done (Boolean): A flag indicating if the operation is complete. false while in progress, true when finished.
  • metadata (Object): A flexible object containing progress information specific to the operation.
  • result (Object): A field that contains the final outcome of the operation once done is true. It will contain either a response or an error.
{
  "name": "operations/scan-a4b1c2d3e4f5",
  "done": false,
  "metadata": {
    "@type": "type.googleapis.com/openchami.collection.v1.ScanMetadata",
    "startTime": "2025-10-01T16:30:00Z",
    "progressPercent": 45,
    "lastUpdateTime": "2025-10-01T16:35:10Z"
  }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    rfdRequest for Discussion

    Type

    No type

    Projects

    Status

    To Do

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions