[RFD]: Hardware Inventory API and Data Model Proposal

### Decision Goal

To agree on the core data models and API endpoints for a new Hardware Inventory contract, and to identify key areas that require further discussion before implementation.

### Category

Architecture

### Stakeholders / Affected Areas

Sys admins, magellan, SMD

### Decision Needed By

_No response_

### Problem Statement

OpenCHAMI currently lacks means for tracking Field Replaceable Units (FRUs) and other hardware. The existing process is focused on functional gathering (e.g., information required for power control) and relies on manual workflows and CLI tools that are not easily auditable or consumable by other services. This proposal aims to solve that problem by defining a core inventory contract and a corresponding set of APIs to establish a baseline, programmatic source of truth for all hardware in the system.

### Core Data Models

#### The `Device` Model

The `Device` model represents any physical hardware component that is uniquely trackable within the system, covering everything from top-level systems like servers and switches down to individual FRUs such as GPUs, NICs, or DIMMs. The primary qualification for an item to be a `Device` is that it can be uniquely identified, typically through a combination of its manufacturer, part number, and serial number discovered via an out-of-band management interface like Redfish. The rationale here is that all items that can be individually replaced are tracked as their own distinct entities. 

When storing `cables`, information about connections (such as when a node is connected to a switch) may be stored in the properties field. Best practice is to store this information as a connection with `endpointA` that lists a `port` and `deviceID` and `endpointB` that lists a `port` and `deviceID` to model the connection.

**Core fields**
* `id` (UUID): The permanent, unique identifier for the hardware.
* `deviceType` (Enum): The type of hardware (e.g., "Node", "GPU", "Rack").
* `manufacturer` (String): The manufacturer name.
* `partNumber` (String): The part number.
* `serialNumber` (String): The serial number.
* `parentID` (UUID): The parent device of this device. If null, this is a top-level device (i.e., a `rack`; dimms are children of nodes, etc.).
* `childrenDeviceIds` (Array of UUIDs): A read-only list of devices contained within this one. Calculated on-request, not stored (to avoid frequent updates).

**Arbitrary key-value store**
* `properties` (Map of strings to JSON values): An arbitrary key-value map for storing additional, non-standard attributes.

<details><summary>Properties information</summary>

### The `properties` Field for Custom Attributes

To resolve the open question regarding custom attributes, a `properties` field will be in the `Device` model. This field allows storing arbitrary key-value data that is not covered by the core model fields.

The `properties` field is a map where keys are strings and values can be any valid JSON type (string, number, boolean, null, array, or object). To ensure consistency and usability, the following constraints and guidelines apply.

#### Constraints on Keys

* all keys must be in lowercase `snake_case`.
* keys may only contain lowercase alphanumeric characters (`a-z`, `0-9`), underscores (`_`), and dots (`.`).
* the dot character (`.`) is used exclusively as a namespace separator to group related attributes (e.g., `bios.release_date`).

#### Key Transformation Examples

| HPCM Key            | OpenCHAMI Key       |
| ------------------- | ------------------- |
| `biosBootMode`      | `bios_boot_mode`    |
| `operationalStatus` | `operational_status`|
| `rootFs`            | `root_fs`           |
| `CONSERVER_LOGGING` | `conserver_logging` |
| `dns_domain`        | `dns_domain`        |
| `Wake-up Type`      | `wake_up_type`      |
| `SKU Number`        | `sku_number`        |
| `bios.Release Date` | `bios.release_date` |

#### Other constraints
* all values stored in the `properties` field must be valid JSON
* whenever possible, use simple JSON types (String, Number, Boolean)
* only use JSON Objects or Arrays when data is inherently structured as a group or list

**Example of a JSON Object value:**
```json
"aliases": {
  "a2000": "ProLiant_XL225n_Gen10_Plus_n1",
  "brazos": "brazos1",
  "product": "XL225n_Gen10_Plus_n1"
}
```

**Example of a JSON array value:**
```json
"protocol": ["Hpe", "NO_DCMI", "NO_DCMI_NM", "None", "ipmi", "redfish"]
```

</details>

**Metadata**
* `apiVersion` (String): The API group version (e.g., "inventory/v1").
* `kind` (String): The resource type (e.g., "Device").
* `schemaVersion` (String): The version of this resource's schema.
* `createdAt` (Timestamp): Timestamp of when the device was created.
* `updatedAt` (Timestamp): Timestamp of the last update.
* `deletedAt` (Timestamp): Timestamp for soft deletes.


<details><summary>Device JSON</summary>

```json
{
  "apiVersion": "inventory/v1",
  "kind": "Device",
  "schemaVersion": "v1",
  "id": "d5e6f7a8-b9c0-d1e2-f3g4-h5i6j7k8l9m0",
  "deviceType": "Node",
  "manufacturer": "Vendor Inc.",
  "partNumber": "VN-123-456",
  "serialNumber": "SN987654321",
  "parentID": "f7g8h9i0-j1k2-l3m4-n5o6-p7q8r9s0t1u2",
  "childrenDeviceIds": [
    "gpu-uuid-01",
    "dimm-uuid-01",
    "dimm-uuid-02"
  ],
  "properties": [
    "device_type_slug": "ProLiant-BL460c-Gen10",
    "protocol": ["Hpe", "NO_DCMI", "NO_DCMI_NM", "None", "ipmi", "redfish"]
  ]
  "createdAt": "2025-09-20T10:00:00Z",
  "updatedAt": "2025-09-29T14:30:00Z",
  "deletedAt": null
}

{
  "apiVersion": "inventory/v1",
  "kind": "Device",
  "schemaVersion": "v1",
  "id": "d5e6f7a8-b9c0-d1e2-f3g4-h5i6j7k8l9m0",
  "deviceType": "Node",
  "manufacturer": "Vendor Inc.",
  "partNumber": "VN-123-456",
  "serialNumber": "SN987654321",
  "parentID": "chassis-uuid-b01",
  "childrenDeviceIds": [
    "gpu-uuid-01",
    "dimm-uuid-01",
    "dimm-uuid-02"
  ],
  "properties": {
    "device_type_slug": "ProLiant-BL460c-Gen10",
    "protocol": ["Hpe", "NO_DCMI", "NO_DCMI_NM", "None", "ipmi", "redfish"],
    "connections": [
      {
        "endpointA": "switch-uuid-tor1",
        "port": "eth0"
      },
      {
        "endpointB": "node1-uuid",
        "port": "gbe-1/0/5"
      }
    ]
  },
  "createdAt": "2025-09-20T10:00:00Z",
  "updatedAt": "2025-09-29T14:30:00Z",
  "deletedAt": null
}
```

</details>

### API Specification

The hardware inventory contract is exposed through three logical API groups. Clients can request a specific schema version of a resource using the `Accept` header.

#### Inventory API (`/apis/inventory/v1`)

Provides a RESTful interface for managing the current state of the inventory via CRUD operations.

* **Device Management**
    * `GET /devices`: List and filter all devices.
    * `POST /devices`: Create a new device.
    * `GET /devices/{deviceId}`: Get a single device.
    * `PATCH /devices/{deviceId}`: Partially update a device.
    * `DELETE /devices/{deviceId}`: Delete a device.

#### Other APIs

To fully support FRU tracking snapshots and history, there will be two supporting APIs (inventory history and inventory collection), but the contents of those are outside the scope of this current RFD.

The collection API will have a scan operation to gather inventory and display what has changed since the last approved system state and the history API will store individual machine states as snapshots to provide a record of machine hardware changes.

### Proposed API Groups

The proposed solution is a unified hardware inventory contract exposed via three logical **API groups** that work together to provide a complete inventory solution.

* **Inventory API:** The system of record. Its sole responsibility is to provide a RESTful interface for managing the current state of hardware inventory.
* **History API:** Manages the long-term storage of historical inventory snapshots, provides data retention capabilities, and handles comparison (diff) operations.
* **Collection API:** Actively discovers hardware information, compares it to the last known state to generate a diff report, and submits approved changes to the Inventory API.

### Alternatives Considered

* **Extend Magellan:** Implement inventory gathering using the existing `magellan` CLI tool for manual management rather than building a new set of APIs. Rejected, as the goal is to provide stable APIs that other services can consume programmatically.

### Other Considerations

**Ongoing Design Questions**

* **COMPLETE (see blelow)** Endpoint-to-Location Mapping: What is the most effective and flexible mechanism for administrators to provide the mapping between discoverable hardware endpoints (e.g., BMC IPs) and their physical locations?
* **COMPLETE (see below)** Custom Attributes: Should the API support storing arbitrary key-value data for devices and locations?

**Changes based on discussion**
* Moved `deviceTypeSlug` out of top-level design. This is gathered/created fundamentally differently than hardware-specific things, such as manufacturer, part number, etc., so we decided that it should belong in the properties field, if desired.
* Added a `properties` field for storing arbitrary string->JSON values.
* Added a description of what the point is for having locations separate from devices.
* Added a description of what all counts as a device.
* Renamed `componentType` to `deviceType`.
* Added `geolocation` to `location` to represent the actual physical location, which cannot be discovered by out-of-band discovery.
* Workflow updated to reflect new, optional-geolocation model.
* Pulled out less-relevant parts into summary sections
* Broke down fields based on metadata vs core fields to make it more readable
* Restructured document to make more readable
* Removed location and connection resources
* Added parentID to device
* Removed locationID from device
* Added section about modeling connections with cables properties field

### Related Docs / PRs

* https://github.com/Cray-HPE/cani
* https://cloudevents.io/

<details>

<details><summary>System workflow overview and Events</summary>

### System Workflow Overview

The new workflow is discovery-first, automatically generating the inventory structure based on what is physically present. Specific geolocation details are treated as optional data that can be added by an administrator after the hardware has been discovered.

1.  **Discovery is Initiated**: A scan is collected by the `Collection API` using a list of discoverable hardware management endpoints (e.g., a range of BMC IP addresses).
2.  **Hardware and Hierarchy are Discovered**: The collection logic connects to each endpoint and discovers all hardware components and their parent-child relationships (e.g., a server containing GPUs and DIMMs).
3.  **A Diff Report is Generated**: The discovered hardware state is compared against the last known state in the `Inventory API`. For new hardware, the system proposes the creation of corresponding `Device` and `Location` resources, automatically building the location hierarchy based on the discovered parent-child relationships.
4.  **Changes are Approved**: An administrator reviews the structured diff report and approves the changes through the `Collection API`.
5.  **System State is Updated**: Upon approval, the `Inventory API` creates or updates the `Device` resources and their associated `Location` resources. A top-level component's location will have a `null` `parentLocationId`, establishing it as the root. Events are then emitted to trigger the creation of a new historical snapshot.
6.  **Physical Geolocation is Optionally Added**: After the hardware exists in the system, an administrator can enrich the data by updating the `Location` resources with specific physical details (e.g., rack, U-position) via the new `geolocation` field. This is a manual enrichment step, not a prerequisite for discovery.

#### Event Sourcing with CloudEvents

Inventory operations will generate events, the long term vision is currently to use CloudEvents, but the broader decision of OpenCHAMI event sourcing is not a part of this RFD and will be proposed after consensus is reached on the broader question of events in OpenCHAMI.

</details>

Initial draft of other APIs to provide context, though these are subject to change and will be presented as separate RFDs.

#### History API (`/apis/history/v1`)

Provides endpoints for accessing and managing historical inventory snapshots.

* `GET /snapshots`: List all available snapshots.
* `GET /snapshots/{snapshotId}`: Get a specific historical snapshot of the inventory.
* `GET /snapshots/diff`: Compare two snapshots. (e.g., `?from={snapshotId1}&to={snapshotId2}`)
* `GET /events?subject={deviceId}`: Get the complete change history of events for one specific device.
* **Data Retention**
    * `GET /policy`: Get the current data retention policy.
    * `PUT /policy`: Set the data retention policy.
    * `POST /snapshots/{snapshotId}/pin`: Pin a snapshot to prevent automatic deletion.
    * `DELETE /snapshots/{snapshotId}/pin`: Unpin a snapshot.

#### Collection API (`/apis/collection/v1`)

Provides endpoints for initiating and managing the asynchronous hardware discovery workflow.

* `POST /scans`: Trigger a new discovery scan. Returns an `Operation` resource.
* `POST /scans/{scanId}/approve`: Approve the changes from a completed scan. Returns an `Operation` resource.
* `GET /scans/{scanId}/diff`: Get the diff report for a completed scan. The `scanId` is found in the result of a successful scan `Operation`.
* `GET /operations/{operationId}`: Get the status of a long-running operation.

#### The `Operation` Model

The `Operation` resource is used to track the status of any long-running asynchronous task, such as a hardware scan or the process of applying approved changes.

* `name` (String): The unique, server-assigned name of the operation, which also serves as its ID (e.g., `operations/scan-a4b1c2d3e4f5`).
* `done` (Boolean): A flag indicating if the operation is complete. `false` while in progress, `true` when finished.
* `metadata` (Object): A flexible object containing progress information specific to the operation.
* `result` (Object): A field that contains the final outcome of the operation once `done` is `true`. It will contain either a `response` or an `error`.

```json
{
  "name": "operations/scan-a4b1c2d3e4f5",
  "done": false,
  "metadata": {
    "@type": "type.googleapis.com/openchami.collection.v1.ScanMetadata",
    "startTime": "2025-10-01T16:30:00Z",
    "progressPercent": 45,
    "lastUpdateTime": "2025-10-01T16:35:10Z"
  }
}
```


</detials>

HPCM Key	OpenCHAMI Key
`biosBootMode`	`bios_boot_mode`
`operationalStatus`	`operational_status`
`rootFs`	`root_fs`
`CONSERVER_LOGGING`	`conserver_logging`
`dns_domain`	`dns_domain`
`Wake-up Type`	`wake_up_type`
`SKU Number`	`sku_number`
`bios.Release Date`	`bios.release_date`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFD]: Hardware Inventory API and Data Model Proposal #112

Decision Goal

Category

Stakeholders / Affected Areas

Decision Needed By

Problem Statement

Core Data Models

The `Device` Model

The `properties` Field for Custom Attributes

Constraints on Keys

Key Transformation Examples

Other constraints

API Specification

Inventory API (`/apis/inventory/v1`)

Other APIs

Proposed API Groups

Alternatives Considered

Other Considerations

Related Docs / PRs

System Workflow Overview

Event Sourcing with CloudEvents

History API (`/apis/history/v1`)

Collection API (`/apis/collection/v1`)

The `Operation` Model

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFD]: Hardware Inventory API and Data Model Proposal #112

Description

Decision Goal

Category

Stakeholders / Affected Areas

Decision Needed By

Problem Statement

Core Data Models

The Device Model

The properties Field for Custom Attributes

Constraints on Keys

Key Transformation Examples

Other constraints

API Specification

Inventory API (/apis/inventory/v1)

Other APIs

Proposed API Groups

Alternatives Considered

Other Considerations

Related Docs / PRs

System Workflow Overview

Event Sourcing with CloudEvents

History API (/apis/history/v1)

Collection API (/apis/collection/v1)

The Operation Model

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

The `Device` Model

The `properties` Field for Custom Attributes

Inventory API (`/apis/inventory/v1`)

History API (`/apis/history/v1`)

Collection API (`/apis/collection/v1`)

The `Operation` Model