-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Decision Goal
To agree on the core data models and API endpoints for a new Hardware Inventory contract, and to identify key areas that require further discussion before implementation.
Category
Architecture
Stakeholders / Affected Areas
Sys admins, magellan, SMD
Decision Needed By
No response
Problem Statement
OpenCHAMI currently lacks means for tracking Field Replaceable Units (FRUs) and other hardware. The existing process is focused on functional gathering (e.g., information required for power control) and relies on manual workflows and CLI tools that are not easily auditable or consumable by other services. This proposal aims to solve that problem by defining a core inventory contract and a corresponding set of APIs to establish a baseline, programmatic source of truth for all hardware in the system.
Core Data Models
The Device Model
The Device model represents any physical hardware component that is uniquely trackable within the system, covering everything from top-level systems like servers and switches down to individual FRUs such as GPUs, NICs, or DIMMs. The primary qualification for an item to be a Device is that it can be uniquely identified, typically through a combination of its manufacturer, part number, and serial number discovered via an out-of-band management interface like Redfish. The rationale here is that all items that can be individually replaced are tracked as their own distinct entities.
When storing cables, information about connections (such as when a node is connected to a switch) may be stored in the properties field. Best practice is to store this information as a connection with endpointA that lists a port and deviceID and endpointB that lists a port and deviceID to model the connection.
Core fields
id(UUID): The permanent, unique identifier for the hardware.deviceType(Enum): The type of hardware (e.g., "Node", "GPU", "Rack").manufacturer(String): The manufacturer name.partNumber(String): The part number.serialNumber(String): The serial number.parentID(UUID): The parent device of this device. If null, this is a top-level device (i.e., arack; dimms are children of nodes, etc.).childrenDeviceIds(Array of UUIDs): A read-only list of devices contained within this one. Calculated on-request, not stored (to avoid frequent updates).
Arbitrary key-value store
properties(Map of strings to JSON values): An arbitrary key-value map for storing additional, non-standard attributes.
Properties information
The properties Field for Custom Attributes
To resolve the open question regarding custom attributes, a properties field will be in the Device model. This field allows storing arbitrary key-value data that is not covered by the core model fields.
The properties field is a map where keys are strings and values can be any valid JSON type (string, number, boolean, null, array, or object). To ensure consistency and usability, the following constraints and guidelines apply.
Constraints on Keys
- all keys must be in lowercase
snake_case. - keys may only contain lowercase alphanumeric characters (
a-z,0-9), underscores (_), and dots (.). - the dot character (
.) is used exclusively as a namespace separator to group related attributes (e.g.,bios.release_date).
Key Transformation Examples
| HPCM Key | OpenCHAMI Key |
|---|---|
biosBootMode |
bios_boot_mode |
operationalStatus |
operational_status |
rootFs |
root_fs |
CONSERVER_LOGGING |
conserver_logging |
dns_domain |
dns_domain |
Wake-up Type |
wake_up_type |
SKU Number |
sku_number |
bios.Release Date |
bios.release_date |
Other constraints
- all values stored in the
propertiesfield must be valid JSON - whenever possible, use simple JSON types (String, Number, Boolean)
- only use JSON Objects or Arrays when data is inherently structured as a group or list
Example of a JSON Object value:
"aliases": {
"a2000": "ProLiant_XL225n_Gen10_Plus_n1",
"brazos": "brazos1",
"product": "XL225n_Gen10_Plus_n1"
}Example of a JSON array value:
"protocol": ["Hpe", "NO_DCMI", "NO_DCMI_NM", "None", "ipmi", "redfish"]Metadata
apiVersion(String): The API group version (e.g., "inventory/v1").kind(String): The resource type (e.g., "Device").schemaVersion(String): The version of this resource's schema.createdAt(Timestamp): Timestamp of when the device was created.updatedAt(Timestamp): Timestamp of the last update.deletedAt(Timestamp): Timestamp for soft deletes.
Device JSON
{
"apiVersion": "inventory/v1",
"kind": "Device",
"schemaVersion": "v1",
"id": "d5e6f7a8-b9c0-d1e2-f3g4-h5i6j7k8l9m0",
"deviceType": "Node",
"manufacturer": "Vendor Inc.",
"partNumber": "VN-123-456",
"serialNumber": "SN987654321",
"parentID": "f7g8h9i0-j1k2-l3m4-n5o6-p7q8r9s0t1u2",
"childrenDeviceIds": [
"gpu-uuid-01",
"dimm-uuid-01",
"dimm-uuid-02"
],
"properties": [
"device_type_slug": "ProLiant-BL460c-Gen10",
"protocol": ["Hpe", "NO_DCMI", "NO_DCMI_NM", "None", "ipmi", "redfish"]
]
"createdAt": "2025-09-20T10:00:00Z",
"updatedAt": "2025-09-29T14:30:00Z",
"deletedAt": null
}
{
"apiVersion": "inventory/v1",
"kind": "Device",
"schemaVersion": "v1",
"id": "d5e6f7a8-b9c0-d1e2-f3g4-h5i6j7k8l9m0",
"deviceType": "Node",
"manufacturer": "Vendor Inc.",
"partNumber": "VN-123-456",
"serialNumber": "SN987654321",
"parentID": "chassis-uuid-b01",
"childrenDeviceIds": [
"gpu-uuid-01",
"dimm-uuid-01",
"dimm-uuid-02"
],
"properties": {
"device_type_slug": "ProLiant-BL460c-Gen10",
"protocol": ["Hpe", "NO_DCMI", "NO_DCMI_NM", "None", "ipmi", "redfish"],
"connections": [
{
"endpointA": "switch-uuid-tor1",
"port": "eth0"
},
{
"endpointB": "node1-uuid",
"port": "gbe-1/0/5"
}
]
},
"createdAt": "2025-09-20T10:00:00Z",
"updatedAt": "2025-09-29T14:30:00Z",
"deletedAt": null
}API Specification
The hardware inventory contract is exposed through three logical API groups. Clients can request a specific schema version of a resource using the Accept header.
Inventory API (/apis/inventory/v1)
Provides a RESTful interface for managing the current state of the inventory via CRUD operations.
- Device Management
GET /devices: List and filter all devices.POST /devices: Create a new device.GET /devices/{deviceId}: Get a single device.PATCH /devices/{deviceId}: Partially update a device.DELETE /devices/{deviceId}: Delete a device.
Other APIs
To fully support FRU tracking snapshots and history, there will be two supporting APIs (inventory history and inventory collection), but the contents of those are outside the scope of this current RFD.
The collection API will have a scan operation to gather inventory and display what has changed since the last approved system state and the history API will store individual machine states as snapshots to provide a record of machine hardware changes.
Proposed API Groups
The proposed solution is a unified hardware inventory contract exposed via three logical API groups that work together to provide a complete inventory solution.
- Inventory API: The system of record. Its sole responsibility is to provide a RESTful interface for managing the current state of hardware inventory.
- History API: Manages the long-term storage of historical inventory snapshots, provides data retention capabilities, and handles comparison (diff) operations.
- Collection API: Actively discovers hardware information, compares it to the last known state to generate a diff report, and submits approved changes to the Inventory API.
Alternatives Considered
- Extend Magellan: Implement inventory gathering using the existing
magellanCLI tool for manual management rather than building a new set of APIs. Rejected, as the goal is to provide stable APIs that other services can consume programmatically.
Other Considerations
Ongoing Design Questions
- COMPLETE (see blelow) Endpoint-to-Location Mapping: What is the most effective and flexible mechanism for administrators to provide the mapping between discoverable hardware endpoints (e.g., BMC IPs) and their physical locations?
- COMPLETE (see below) Custom Attributes: Should the API support storing arbitrary key-value data for devices and locations?
Changes based on discussion
- Moved
deviceTypeSlugout of top-level design. This is gathered/created fundamentally differently than hardware-specific things, such as manufacturer, part number, etc., so we decided that it should belong in the properties field, if desired. - Added a
propertiesfield for storing arbitrary string->JSON values. - Added a description of what the point is for having locations separate from devices.
- Added a description of what all counts as a device.
- Renamed
componentTypetodeviceType. - Added
geolocationtolocationto represent the actual physical location, which cannot be discovered by out-of-band discovery. - Workflow updated to reflect new, optional-geolocation model.
- Pulled out less-relevant parts into summary sections
- Broke down fields based on metadata vs core fields to make it more readable
- Restructured document to make more readable
- Removed location and connection resources
- Added parentID to device
- Removed locationID from device
- Added section about modeling connections with cables properties field
Related Docs / PRs
Details
System workflow overview and Events
System Workflow Overview
The new workflow is discovery-first, automatically generating the inventory structure based on what is physically present. Specific geolocation details are treated as optional data that can be added by an administrator after the hardware has been discovered.
- Discovery is Initiated: A scan is collected by the
Collection APIusing a list of discoverable hardware management endpoints (e.g., a range of BMC IP addresses). - Hardware and Hierarchy are Discovered: The collection logic connects to each endpoint and discovers all hardware components and their parent-child relationships (e.g., a server containing GPUs and DIMMs).
- A Diff Report is Generated: The discovered hardware state is compared against the last known state in the
Inventory API. For new hardware, the system proposes the creation of correspondingDeviceandLocationresources, automatically building the location hierarchy based on the discovered parent-child relationships. - Changes are Approved: An administrator reviews the structured diff report and approves the changes through the
Collection API. - System State is Updated: Upon approval, the
Inventory APIcreates or updates theDeviceresources and their associatedLocationresources. A top-level component's location will have anullparentLocationId, establishing it as the root. Events are then emitted to trigger the creation of a new historical snapshot. - Physical Geolocation is Optionally Added: After the hardware exists in the system, an administrator can enrich the data by updating the
Locationresources with specific physical details (e.g., rack, U-position) via the newgeolocationfield. This is a manual enrichment step, not a prerequisite for discovery.
Event Sourcing with CloudEvents
Inventory operations will generate events, the long term vision is currently to use CloudEvents, but the broader decision of OpenCHAMI event sourcing is not a part of this RFD and will be proposed after consensus is reached on the broader question of events in OpenCHAMI.
Initial draft of other APIs to provide context, though these are subject to change and will be presented as separate RFDs.
History API (/apis/history/v1)
Provides endpoints for accessing and managing historical inventory snapshots.
GET /snapshots: List all available snapshots.GET /snapshots/{snapshotId}: Get a specific historical snapshot of the inventory.GET /snapshots/diff: Compare two snapshots. (e.g.,?from={snapshotId1}&to={snapshotId2})GET /events?subject={deviceId}: Get the complete change history of events for one specific device.- Data Retention
GET /policy: Get the current data retention policy.PUT /policy: Set the data retention policy.POST /snapshots/{snapshotId}/pin: Pin a snapshot to prevent automatic deletion.DELETE /snapshots/{snapshotId}/pin: Unpin a snapshot.
Collection API (/apis/collection/v1)
Provides endpoints for initiating and managing the asynchronous hardware discovery workflow.
POST /scans: Trigger a new discovery scan. Returns anOperationresource.POST /scans/{scanId}/approve: Approve the changes from a completed scan. Returns anOperationresource.GET /scans/{scanId}/diff: Get the diff report for a completed scan. ThescanIdis found in the result of a successful scanOperation.GET /operations/{operationId}: Get the status of a long-running operation.
The Operation Model
The Operation resource is used to track the status of any long-running asynchronous task, such as a hardware scan or the process of applying approved changes.
name(String): The unique, server-assigned name of the operation, which also serves as its ID (e.g.,operations/scan-a4b1c2d3e4f5).done(Boolean): A flag indicating if the operation is complete.falsewhile in progress,truewhen finished.metadata(Object): A flexible object containing progress information specific to the operation.result(Object): A field that contains the final outcome of the operation oncedoneistrue. It will contain either aresponseor anerror.
{
"name": "operations/scan-a4b1c2d3e4f5",
"done": false,
"metadata": {
"@type": "type.googleapis.com/openchami.collection.v1.ScanMetadata",
"startTime": "2025-10-01T16:30:00Z",
"progressPercent": 45,
"lastUpdateTime": "2025-10-01T16:35:10Z"
}
}Metadata
Metadata
Assignees
Labels
Type
Projects
Status