Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
40593c3
small deployment status updates
ajcraig Oct 27, 2025
b5a7040
Initial content injection from approved SUP
ajcraig Oct 29, 2025
fde295f
feat: Migrate Desired State SUP into spec
matlec Oct 30, 2025
fed61ef
feat: Add note on digest format vs RFC 9421
matlec Oct 30, 2025
ed1668f
Updated deployment status and device capabilities markdowns.
ajcraig Oct 30, 2025
ac4b975
feat: Update workload deployment concepts
matlec Oct 31, 2025
1dc2daf
Improvements to concepts, certificate and onboarding API.
ajcraig Oct 31, 2025
9e326c6
Small commit to reorder the navigation to match the client/server int…
ajcraig Oct 31, 2025
ed3e38d
Updates to the various markdowns associated with the onboarding, capa…
ajcraig Nov 12, 2025
e99403c
Addressed feedback received via internal review.
ajcraig Nov 24, 2025
88df261
Embedded the management interface swagger definition.
ajcraig Nov 25, 2025
23e83e4
Pinned python 3.12 in the pages job. There was a compatibility issue …
ajcraig Dec 2, 2025
353666b
Update system-design/specification/margo-management-interface/api-req…
ajcraig Dec 3, 2025
5c0928d
Update system-design/specification/margo-management-interface/api-req…
ajcraig Dec 3, 2025
c6b6783
Update system-design/specification/margo-management-interface/device-…
ajcraig Dec 3, 2025
814bc7f
Update system-design/specification/margo-management-interface/device-…
ajcraig Dec 3, 2025
7e1f667
Updates based on recent review
ajcraig Dec 3, 2025
9494b43
Apply suggestions from code review
ajcraig Dec 3, 2025
45c9c51
- Changed names of files based on fb.
ajcraig Dec 3, 2025
e2cbac1
fix: clarify link between Desired State and Deployment Status API
matlec Dec 4, 2025
e2326cb
fix: match style of desired state API docs with rest of the spec
matlec Dec 4, 2025
71f9d2f
fix: clarify logging requirements for security events in manifest upd…
matlec Dec 4, 2025
8257685
fix: clarify replay attack prevention details in signature handling
matlec Dec 5, 2025
85c065b
- Removed note on root CA that was out of place.
ajcraig Dec 9, 2025
dbef163
Updated version of the API Swagger document along with the link.
ajcraig Dec 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/pages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ jobs:
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: 3.x
python-version: '3.12'

- name: Install poetry
uses: abatilo/actions-poetry@v2
Expand Down
11 changes: 10 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,18 +20,20 @@ nav:
- concepts/workloads/application-registry.md
- concepts/workloads/local-registries.md
- Workload Fleet Managers:
- concepts/workload-fleet-managers/device-client-onboarding.md
- concepts/workload-fleet-managers/device-capabilities.md
- concepts/workload-fleet-managers/workload-deployment.md
- Edge Compute Devices:
- concepts/edge-compute-devices/devices.md
- Specification:
- Margo Management Interface:
- specification/margo-management-interface/api-requirements-and-security.md
- specification/margo-management-interface/device-onboarding.md
- specification/margo-management-interface/certificate-api.md
- specification/margo-management-interface/device-client-onboarding.md
- specification/margo-management-interface/device-capabilities.md
- specification/margo-management-interface/desired-state.md
- specification/margo-management-interface/deployment-status.md
- specification/margo-management-interface/management-interface-swagger.md
- Application Package:
- specification/application-package/application-description.md
- Margo Devices:
Expand Down Expand Up @@ -79,6 +81,12 @@ markdown_extensions:

extra_css:
- css/margo.css
- assets/swagger-ui.css

extra_javascript:
- assets/swagger-ui-bundle.js
- assets/swagger-ui-standalone-preset.js


extra:
# https://squidfunk.github.io/mkdocs-material/setup/setting-up-the-footer/#generator-notice
Expand Down Expand Up @@ -106,3 +114,4 @@ plugins:
variable_start_string: =@=
variable_end_string: =@=
- privacy

Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,8 @@
id: http://specification.margo.org/desired_state_schema
name: DesiredState
description: >-
The desired state is expressed as a
[Kubernetes custom resource definition](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
and made available to the device's management client as a YAML document using the OpenGitOps pattern.
Each workload is represented as an `ApplicationDeployment` YAML file that specifies its components, configuration, and parameters.
This resource is delivered via the Desired State API and referenced by `id` in the Deployment Status API.
version: 1.0 #Arne: update later
prefixes:
linkml: https://w3id.org/linkml/
Expand Down

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions system-design/assets/swagger-ui-bundle.js

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions system-design/assets/swagger-ui-standalone-preset.js

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions system-design/assets/swagger-ui.css

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Device Capabilities

The purpose of device capabilities reporting is to ensure the Workload Fleet Management (WFM) solution has the information needed to pair workloads with compatible edge devices. The device's capabilities are reported to the WFM web service using the Margo Management API.
The purpose of device capabilities reporting is to ensure the Workload Fleet Management (WFM) solution has the information needed to pair workloads with compatible edge devices. The device's capabilities are reported to the WFM web service using the [Device Capabilities API](../../specification/margo-management-interface/device-capabilities.md).

### Device Capability Reporting

The device owner MUST report their device's capabilities and characteristics, via the device API, when onboarding the device with the Workload Fleet Management solution. Additionally, during the lifecycle of the edge device, if there is a change that impacts the reported characteristics, the device MUST update the Workload Fleet Manager with the latest information via the management API.
The device owner reports their device's capabilities and characteristics, via the device API, when onboarding the device with the Workload Fleet Management solution. Additionally, during the lifecycle of the edge device, if there is a change that impacts the reported characteristics, the device updates the Workload Fleet Manager with the latest information via the [Device Capabilities API](../../specification/margo-management-interface/device-capabilities.md).

The following information MUST be provided:
The following information is exchanged:

- Device Id
- Device Vendor
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Device Client Onboarding

To enable workload management, the device's client first establishes trust and completes an onboarding process with the End Users' selected Workload Fleet Manager. This onboarding process enables late binding, which is a critical Margo non-functional requirement that enables a device to bind to any Margo-compatible Workload Fleet Manager.

The onboarding process includes several core functions:

- Establishing trust between the device and the WFM
- Registering the device client and assigning a unique identifier
- Reporting device capabilities to enable workload placement decisions

## Trust Establishment

Initial trust is established between the device's Workload Fleet Management (WFM) Client and the WFM using server-side TLS.
Before the WFM Client can connect securely, it obtains the WFM's root CA certificate. This trust anchor can be:

- downloaded via the Certificate API, provided that an existing trusted channel is available, or
- delivered out-of-band (e.g., preloaded by the device owner or transferred via USB)

Importing the WFM's root CA certificate enables the WFM Client to authenticate the WFM during TLS connections. Mutual TLS (mTLS) is deliberately avoided, as some deployment environments include network components or intermediaries that may not support or forward client-certificate authentication.
Instead, transport security and server authentication are provided by server-side TLS, while client authentication and request integrity are performed at the application layer: the WFM Client uses its own X.509 certificate to create HTTP message signatures for each request. This approach maintains strong, certificate-based authenticity and integrity while accommodating a wide range of network architectures.


## Certificates required

Both the WFM server and the WFM Client use X.509 certificates, but for different purposes. The WFM's certificate authenticates the server during TLS sessions. Each device client possesses a unique X.509 certificate used to sign its HTTP requests, enabling the WFM to verify the origin and integrity of every message. These certificates provide complementary security properties: TLS ensures transport confidentiality and server authenticity, while application-layer signatures provide client authentication and payload integrity. Private keys remain securely stored on the device, and all signing operations occur locally, reducing exposure to key compromise.


## Unique Identifiers

The Workload Fleet Manager assigns a globally unique identifier to the device's management client during the onboarding process. This is needed to ensure unique interactions between each device with the Fleet Manager.

## Device Capability Reporting

After onboarding, the device client reports its capabilities to the WFM server using the device capability reporting API.

## Relevant Links

Please follow the subsequent links to view more technical information on the concepts described above:

- [API Security Details](../../specification/margo-management-interface/api-requirements-and-security.md)
- [Certificate API](../../specification/margo-management-interface/certificate-api.md)
- [Device Onboarding API](../../specification/margo-management-interface/device-client-onboarding.md)
- [Device Capabilities](../../specification/margo-management-interface/device-capabilities.md)
Original file line number Diff line number Diff line change
@@ -1,45 +1,111 @@
# Workload Deployment

Margo uses an [OpenGitOps](https://opengitops.dev/) approach for managing the edge device's desired state. The workload orchestration solution vendor maintains Git repositories, under their control, to push updates to the desired state for each device being managed. The device's management client is responsible for monitoring the device's assigned Git repository for any changes to the desired state that MUST be applied.
> Action: The use of GitOps patterns for pulling desired state is still being discussed/investigated.
This page describes how Margo manages the deployment and reconciliation of workloads on Edge Compute Devices.

Workload deployment in Margo is based on a declarative Desired State model.
A Workload Fleet Manager (WFM) defines the desired workloads for each Edge Compute Device, including what should run, how each workload should be configured, and the parameters needed for deployment and lifecycle management.
Each device runs a Workload Fleet Management Client (WFM Client) that retrieves and applies this Desired State, while reporting progress and results back to the WFM.
This model provides a consistent and observable way to manage workloads across distributed environments.

### Desired State Requirements:
## How it works

> Note: Need to investigate best way to construct the Git Repository. Folder structure / Multiple applications per Edge Device/Cluster
> Note: this is the recommendation from FluxCD <https://fluxcd.io/flux/guides/repository-structure/>
The Workload Fleet Manager coordinates workloads across Edge Compute Devices.
Operators use the WFM to define workloads, update deployments, and view rollout progress across devices.
The WFM Client continuously reconciles the Desired State provided by the WFM with the workloads actually running on the device.

- The workload orchestration solution MUST store the device's [desired state documents](../../specification/margo-management-interface/desired-state.md) within a Git repository the device's management client can access.
> Note: Git repository storage was selected to ensure secure storage and traceability pertaining to the workload's desire state(s).
- The device's management client MUST monitor the device's Git repository for updates to the desired state using the URL and access token provided by the workload orchestration solution during onboarding.
The WFM and WFM Clients communicate through two key interfaces:

### Workload Management Sequence of Operations
- The [Desired State API](../../specification/margo-management-interface/desired-state.md), which distributes workload definitions to devices
- The [Deployment Status API](../../specification/margo-management-interface/deployment-status.md), which collects deployment updates from devices

#### Desired State lifecycle:
Together, these interfaces establish a feedback loop between the centralized manager and the distributed devices, ensuring workload consistency and visibility at scale.

1. The workload orchestration solution creates the [desired state documents](../../specification/margo-management-interface/desired-state.md) based on the end user's inputs when installing, updating or deleting an application.
2. The workload orchestration solution pushes updates to the device's Git repository reflecting the changes to the desired state.
3. The device's management client monitors its assigned Git repository for changes.
4. When the device's management client notices a difference between the current (running) state and the desired state, it MUST pull down and attempt to apply the new desired state.
## Desired State

#### Applying the Desired State:
The Desired State defines the workloads that should run on each Edge Compute Device and the details of how they are deployed.
It is represented by a [State Manifest](../../specification/margo-management-interface/desired-state.md#endpoints-state-manifest) that lists all workloads assigned to a device.
The WFM exposes this manifest through the Desired State API.

1. The device attempts to apply the desired state to become new current state
2. While the new desired state is being applied, the device's management client MUST report progress on state changes (see the [deployment state](#deployment-status) section below) using the [Device API](../../specification/margo-management-interface/deployment-status.md)
Each workload is defined by an [ApplicationDeployment](../../specification/margo-management-interface/desired-state.md#applicationdeployment-yaml-definition), which describes:

#### Deployment Status
- The Components that make up the workload, such as Helm charts or Compose-based container bundles
- Configuration parameters and deployment profiles that control workload behavior
- Target information identifying which devices or groups of devices the deployment applies to

The deployment status is sent to the workload orchestration web service using the [Device API](../../specification/margo-management-interface/deployment-status.md) when there is a change in the deployment state. This informs the workload orchestration web service of the current state as the new desired state is applied.
The WFM can provide ApplicationDeployments in two formats:

The deployment status uses the following rules:
- Individual YAML files, allowing incremental synchronization
- A bundle archive that contains multiple ApplicationDeployments for bulk distribution

- The state is `Pending` once the device management client has received the updated desired state but has not started applying it. When reporting this state indicate the reason.
- Such as waiting on Policy agent
- Waiting on other applications in the 'Order of operations' to be completed.
- The state is `Installing` once the device management client has started the process of applying the desired state.
- The state is `Failure` if at any point the desired state fails to be applied. When reporting a `Failure` state the error message and error code MUST be reported
- The state is `Success` once the desired state has been applied completely
All files retrieved as part of the Desired State—manifests, ApplicationDeployment YAMLs, and bundle archives—are treated as immutable artifacts.
Each artifact is referenced by a SHA-256 digest. The WFM Client validates these digests before applying updates to ensure authenticity and consistency.

## Reconciliation process

> Note: Drawing to be replaced with mermaid sequence diagram.
![Workload Install Sequence Diagram (svg)](../../figures/workload-install-sequence.drawio.svg)
Each WFM Client maintains the Desired State on its Edge Compute Device by running a continuous reconciliation loop.

1. **Retrieve the manifest:**
The WFM Client periodically checks the WFM for updates to its State Manifest.
When a new manifest version is available, the client initiates synchronization.

2. **Retrieve artifacts:**
The WFM Client downloads the referenced ApplicationDeployment YAMLs or bundle archive.

3. **Verify integrity:**
The WFM Client verifies that each artifact matches the digest declared in the manifest.
If verification fails, the update is halted and the current workloads remain unchanged.

4. **Apply the Desired State:**
The WFM Client compares the current workloads with those defined in the Desired State:

- Adds or updates workloads that have changed
- Removes workloads that are no longer listed
- Keeps workloads that remain valid and current

5. **Report status:**
As the synchronization proceeds, the WFM Client reports its deployment status to the WFM through the Deployment Status API.

This continuous process allows the WFM to maintain awareness of workload rollout progress and ensures devices converge toward the Desired State.

## Deployment status

The Deployment Status API provides feedback from devices to the Workload Fleet Manager.
The WFM Client reports progress, success, or failure during installation, update, and removal operations.
This feedback allows the WFM to present an aggregated view of deployment health and state across the managed fleet.

A deployment status report includes:

- The identifier of the ApplicationDeployment
- The current deployment state, which may be:

- Pending - the Desired State has been received but not yet applied
- Installing - the workload is being deployed
- Installed - the workload has been successfully applied
- Removing or Removed - the workload is being or has been uninstalled
- Failed - an error occurred during deployment

- Optional component-level progress information
- Error codes and messages, when applicable

This information enables real-time monitoring and supports troubleshooting and auditing of workload operations.

## Sequence diagram

```mermaid
sequenceDiagram
participant WFM as Workload Fleet Manager
participant Client as WFM Client (running on Edge Compute Device)

loop Periodic synchronization
Client->>WFM: Retrieve Desired State (Desired State API)
alt Desired State unchanged
WFM-->>Client: No update available
else Desired State updated
WFM-->>Client: Provide new State Manifest
Client->>WFM: Retrieve ApplicationDeployments or bundle
Client->>Client: Apply workloads from Desired State
Client->>WFM: Report deployment status (Deployment Status API)
end
end

```
Loading
Loading