-
Notifications
You must be signed in to change notification settings - Fork 10
PR - Integrate approved API SUPs #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
40593c3
small deployment status updates
ajcraig b5a7040
Initial content injection from approved SUP
ajcraig fde295f
feat: Migrate Desired State SUP into spec
matlec fed61ef
feat: Add note on digest format vs RFC 9421
matlec ed1668f
Updated deployment status and device capabilities markdowns.
ajcraig ac4b975
feat: Update workload deployment concepts
matlec 1dc2daf
Improvements to concepts, certificate and onboarding API.
ajcraig 9e326c6
Small commit to reorder the navigation to match the client/server int…
ajcraig ed3e38d
Updates to the various markdowns associated with the onboarding, capa…
ajcraig e99403c
Addressed feedback received via internal review.
ajcraig 88df261
Embedded the management interface swagger definition.
ajcraig 23e83e4
Pinned python 3.12 in the pages job. There was a compatibility issue …
ajcraig 353666b
Update system-design/specification/margo-management-interface/api-req…
ajcraig 5c0928d
Update system-design/specification/margo-management-interface/api-req…
ajcraig c6b6783
Update system-design/specification/margo-management-interface/device-…
ajcraig 814bc7f
Update system-design/specification/margo-management-interface/device-…
ajcraig 7e1f667
Updates based on recent review
ajcraig 9494b43
Apply suggestions from code review
ajcraig 45c9c51
- Changed names of files based on fb.
ajcraig e2cbac1
fix: clarify link between Desired State and Deployment Status API
matlec e2326cb
fix: match style of desired state API docs with rest of the spec
matlec 71f9d2f
fix: clarify logging requirements for security events in manifest upd…
matlec 8257685
fix: clarify replay attack prevention details in signature handling
matlec 85c065b
- Removed note on root CA that was out of place.
ajcraig dbef163
Updated version of the API Swagger document along with the link.
ajcraig File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
272 changes: 266 additions & 6 deletions
272
src/specification/margo-management-interface/resources/index.md.jinja2
ajcraig marked this conversation as resolved.
Show resolved
Hide resolved
|
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
6 changes: 3 additions & 3 deletions
6
system-design/concepts/workload-fleet-managers/device-capabilities.md
nilanjan-samajdar marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
43 changes: 43 additions & 0 deletions
43
system-design/concepts/workload-fleet-managers/device-client-onboarding.md
ajcraig marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| # Device Client Onboarding | ||
|
|
||
| To enable workload management, the device's client first establishes trust and completes an onboarding process with the End Users' selected Workload Fleet Manager. This onboarding process enables late binding, which is a critical Margo non-functional requirement that enables a device to bind to any Margo-compatible Workload Fleet Manager. | ||
|
|
||
| The onboarding process includes several core functions: | ||
|
|
||
| - Establishing trust between the device and the WFM | ||
| - Registering the device client and assigning a unique identifier | ||
| - Reporting device capabilities to enable workload placement decisions | ||
|
|
||
| ## Trust Establishment | ||
|
|
||
| Initial trust is established between the device's Workload Fleet Management (WFM) Client and the WFM using server-side TLS. | ||
| Before the WFM Client can connect securely, it obtains the WFM's root CA certificate. This trust anchor can be: | ||
|
|
||
| - downloaded via the Certificate API, provided that an existing trusted channel is available, or | ||
| - delivered out-of-band (e.g., preloaded by the device owner or transferred via USB) | ||
|
|
||
| Importing the WFM's root CA certificate enables the WFM Client to authenticate the WFM during TLS connections. Mutual TLS (mTLS) is deliberately avoided, as some deployment environments include network components or intermediaries that may not support or forward client-certificate authentication. | ||
| Instead, transport security and server authentication are provided by server-side TLS, while client authentication and request integrity are performed at the application layer: the WFM Client uses its own X.509 certificate to create HTTP message signatures for each request. This approach maintains strong, certificate-based authenticity and integrity while accommodating a wide range of network architectures. | ||
|
|
||
|
|
||
| ## Certificates required | ||
|
|
||
| Both the WFM server and the WFM Client use X.509 certificates, but for different purposes. The WFM's certificate authenticates the server during TLS sessions. Each device client possesses a unique X.509 certificate used to sign its HTTP requests, enabling the WFM to verify the origin and integrity of every message. These certificates provide complementary security properties: TLS ensures transport confidentiality and server authenticity, while application-layer signatures provide client authentication and payload integrity. Private keys remain securely stored on the device, and all signing operations occur locally, reducing exposure to key compromise. | ||
|
|
||
|
|
||
| ## Unique Identifiers | ||
|
|
||
| The Workload Fleet Manager assigns a globally unique identifier to the device's management client during the onboarding process. This is needed to ensure unique interactions between each device with the Fleet Manager. | ||
|
|
||
| ## Device Capability Reporting | ||
|
|
||
| After onboarding, the device client reports its capabilities to the WFM server using the device capability reporting API. | ||
|
|
||
| ## Relevant Links | ||
|
|
||
| Please follow the subsequent links to view more technical information on the concepts described above: | ||
|
|
||
| - [API Security Details](../../specification/margo-management-interface/api-requirements-and-security.md) | ||
| - [Certificate API](../../specification/margo-management-interface/certificate-api.md) | ||
| - [Device Onboarding API](../../specification/margo-management-interface/device-client-onboarding.md) | ||
| - [Device Capabilities](../../specification/margo-management-interface/device-capabilities.md) |
122 changes: 94 additions & 28 deletions
122
system-design/concepts/workload-fleet-managers/workload-deployment.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,45 +1,111 @@ | ||
| # Workload Deployment | ||
|
|
||
| Margo uses an [OpenGitOps](https://opengitops.dev/) approach for managing the edge device's desired state. The workload orchestration solution vendor maintains Git repositories, under their control, to push updates to the desired state for each device being managed. The device's management client is responsible for monitoring the device's assigned Git repository for any changes to the desired state that MUST be applied. | ||
| > Action: The use of GitOps patterns for pulling desired state is still being discussed/investigated. | ||
| This page describes how Margo manages the deployment and reconciliation of workloads on Edge Compute Devices. | ||
|
|
||
| Workload deployment in Margo is based on a declarative Desired State model. | ||
| A Workload Fleet Manager (WFM) defines the desired workloads for each Edge Compute Device, including what should run, how each workload should be configured, and the parameters needed for deployment and lifecycle management. | ||
| Each device runs a Workload Fleet Management Client (WFM Client) that retrieves and applies this Desired State, while reporting progress and results back to the WFM. | ||
| This model provides a consistent and observable way to manage workloads across distributed environments. | ||
|
|
||
| ### Desired State Requirements: | ||
| ## How it works | ||
|
|
||
| > Note: Need to investigate best way to construct the Git Repository. Folder structure / Multiple applications per Edge Device/Cluster | ||
| > Note: this is the recommendation from FluxCD <https://fluxcd.io/flux/guides/repository-structure/> | ||
| The Workload Fleet Manager coordinates workloads across Edge Compute Devices. | ||
| Operators use the WFM to define workloads, update deployments, and view rollout progress across devices. | ||
| The WFM Client continuously reconciles the Desired State provided by the WFM with the workloads actually running on the device. | ||
|
|
||
| - The workload orchestration solution MUST store the device's [desired state documents](../../specification/margo-management-interface/desired-state.md) within a Git repository the device's management client can access. | ||
| > Note: Git repository storage was selected to ensure secure storage and traceability pertaining to the workload's desire state(s). | ||
| - The device's management client MUST monitor the device's Git repository for updates to the desired state using the URL and access token provided by the workload orchestration solution during onboarding. | ||
| The WFM and WFM Clients communicate through two key interfaces: | ||
|
|
||
| ### Workload Management Sequence of Operations | ||
| - The [Desired State API](../../specification/margo-management-interface/desired-state.md), which distributes workload definitions to devices | ||
| - The [Deployment Status API](../../specification/margo-management-interface/deployment-status.md), which collects deployment updates from devices | ||
|
|
||
| #### Desired State lifecycle: | ||
| Together, these interfaces establish a feedback loop between the centralized manager and the distributed devices, ensuring workload consistency and visibility at scale. | ||
|
|
||
| 1. The workload orchestration solution creates the [desired state documents](../../specification/margo-management-interface/desired-state.md) based on the end user's inputs when installing, updating or deleting an application. | ||
| 2. The workload orchestration solution pushes updates to the device's Git repository reflecting the changes to the desired state. | ||
| 3. The device's management client monitors its assigned Git repository for changes. | ||
| 4. When the device's management client notices a difference between the current (running) state and the desired state, it MUST pull down and attempt to apply the new desired state. | ||
| ## Desired State | ||
|
|
||
| #### Applying the Desired State: | ||
| The Desired State defines the workloads that should run on each Edge Compute Device and the details of how they are deployed. | ||
| It is represented by a [State Manifest](../../specification/margo-management-interface/desired-state.md#endpoints-state-manifest) that lists all workloads assigned to a device. | ||
| The WFM exposes this manifest through the Desired State API. | ||
|
|
||
| 1. The device attempts to apply the desired state to become new current state | ||
| 2. While the new desired state is being applied, the device's management client MUST report progress on state changes (see the [deployment state](#deployment-status) section below) using the [Device API](../../specification/margo-management-interface/deployment-status.md) | ||
| Each workload is defined by an [ApplicationDeployment](../../specification/margo-management-interface/desired-state.md#applicationdeployment-yaml-definition), which describes: | ||
|
|
||
| #### Deployment Status | ||
| - The Components that make up the workload, such as Helm charts or Compose-based container bundles | ||
| - Configuration parameters and deployment profiles that control workload behavior | ||
| - Target information identifying which devices or groups of devices the deployment applies to | ||
|
|
||
| The deployment status is sent to the workload orchestration web service using the [Device API](../../specification/margo-management-interface/deployment-status.md) when there is a change in the deployment state. This informs the workload orchestration web service of the current state as the new desired state is applied. | ||
| The WFM can provide ApplicationDeployments in two formats: | ||
|
|
||
| The deployment status uses the following rules: | ||
| - Individual YAML files, allowing incremental synchronization | ||
| - A bundle archive that contains multiple ApplicationDeployments for bulk distribution | ||
|
|
||
| - The state is `Pending` once the device management client has received the updated desired state but has not started applying it. When reporting this state indicate the reason. | ||
| - Such as waiting on Policy agent | ||
| - Waiting on other applications in the 'Order of operations' to be completed. | ||
| - The state is `Installing` once the device management client has started the process of applying the desired state. | ||
| - The state is `Failure` if at any point the desired state fails to be applied. When reporting a `Failure` state the error message and error code MUST be reported | ||
| - The state is `Success` once the desired state has been applied completely | ||
| All files retrieved as part of the Desired State—manifests, ApplicationDeployment YAMLs, and bundle archives—are treated as immutable artifacts. | ||
| Each artifact is referenced by a SHA-256 digest. The WFM Client validates these digests before applying updates to ensure authenticity and consistency. | ||
|
|
||
| ## Reconciliation process | ||
|
|
||
| > Note: Drawing to be replaced with mermaid sequence diagram. | ||
|  | ||
| Each WFM Client maintains the Desired State on its Edge Compute Device by running a continuous reconciliation loop. | ||
|
|
||
| 1. **Retrieve the manifest:** | ||
| The WFM Client periodically checks the WFM for updates to its State Manifest. | ||
| When a new manifest version is available, the client initiates synchronization. | ||
|
|
||
| 2. **Retrieve artifacts:** | ||
| The WFM Client downloads the referenced ApplicationDeployment YAMLs or bundle archive. | ||
|
|
||
| 3. **Verify integrity:** | ||
| The WFM Client verifies that each artifact matches the digest declared in the manifest. | ||
| If verification fails, the update is halted and the current workloads remain unchanged. | ||
|
|
||
| 4. **Apply the Desired State:** | ||
| The WFM Client compares the current workloads with those defined in the Desired State: | ||
|
|
||
| - Adds or updates workloads that have changed | ||
| - Removes workloads that are no longer listed | ||
| - Keeps workloads that remain valid and current | ||
|
|
||
| 5. **Report status:** | ||
| As the synchronization proceeds, the WFM Client reports its deployment status to the WFM through the Deployment Status API. | ||
|
|
||
| This continuous process allows the WFM to maintain awareness of workload rollout progress and ensures devices converge toward the Desired State. | ||
|
|
||
| ## Deployment status | ||
|
|
||
| The Deployment Status API provides feedback from devices to the Workload Fleet Manager. | ||
| The WFM Client reports progress, success, or failure during installation, update, and removal operations. | ||
| This feedback allows the WFM to present an aggregated view of deployment health and state across the managed fleet. | ||
|
|
||
| A deployment status report includes: | ||
|
|
||
| - The identifier of the ApplicationDeployment | ||
| - The current deployment state, which may be: | ||
|
|
||
| - Pending - the Desired State has been received but not yet applied | ||
| - Installing - the workload is being deployed | ||
| - Installed - the workload has been successfully applied | ||
| - Removing or Removed - the workload is being or has been uninstalled | ||
| - Failed - an error occurred during deployment | ||
|
|
||
| - Optional component-level progress information | ||
| - Error codes and messages, when applicable | ||
|
|
||
| This information enables real-time monitoring and supports troubleshooting and auditing of workload operations. | ||
|
|
||
| ## Sequence diagram | ||
|
|
||
| ```mermaid | ||
| sequenceDiagram | ||
| participant WFM as Workload Fleet Manager | ||
| participant Client as WFM Client (running on Edge Compute Device) | ||
|
|
||
| loop Periodic synchronization | ||
| Client->>WFM: Retrieve Desired State (Desired State API) | ||
| alt Desired State unchanged | ||
| WFM-->>Client: No update available | ||
| else Desired State updated | ||
| WFM-->>Client: Provide new State Manifest | ||
| Client->>WFM: Retrieve ApplicationDeployments or bundle | ||
| Client->>Client: Apply workloads from Desired State | ||
| Client->>WFM: Report deployment status (Deployment Status API) | ||
| end | ||
| end | ||
|
|
||
| ``` |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.