Skip to content

[chassis]: Add chassis provisioning HLD#2252

Open
liamkearney-msft wants to merge 1 commit intosonic-net:masterfrom
liamkearney-msft:liam/chassis-autoprovision
Open

[chassis]: Add chassis provisioning HLD#2252
liamkearney-msft wants to merge 1 commit intosonic-net:masterfrom
liamkearney-msft:liam/chassis-autoprovision

Conversation

@liamkearney-msft
Copy link

@liamkearney-msft liamkearney-msft commented Mar 4, 2026

Add HLD for automatic module provisioning.
Introduces a new API and module operational states + pmon daemon to facilitate this within the sonic layer.

.md link with formatting: https://github.com/liamkearney-msft/SONiC/blob/liam/chassis-autoprovision/doc/chassis/module-provisioning/chassis-linecard-provisioning-hld.md

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

Signed-off-by: Liam Kearney <liamkearney@microsoft.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

Copy link

@Javier-Tan Javier-Tan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

```


## New pmon daemon - sonic-provisiond

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to have a separate daemon for this? Can this functionality not be folded into chassisd?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to go this way after a discussion with @arlakshm
Theres a few upsides of having it separate. If it's a separate daemon, disabling this feature becomes trivial. It also avoids creeping the scope for chassisd, and allows us to keep its role as "sync module state with statedb". It also simplifies keeping the state updated while conversion could potentially be running / allows us to block in provision_module() without having to block the chassisd thread (and avoid spinning out new threads in chassisd.)

So, yeah it could be folded in, but separating it out lets us avoid creeping the scope of chassisd & is more modular. If we are going through statedb anyway, there is no real need to strictly couple it with chassisd.

@liamkearney-msft
Copy link
Author

hi reviewers - PR for sonic-platform-common with API stubs / new states can be found here : sonic-net/sonic-platform-common#635
cc @patrickmacarthur @kenneth-arista @arlakshm

# Module state when module is detected, is able to run SONiC, but is not yet running SONiC.
# Modules in this state will be attempted to be converted to SONiC via calls to module.provision_module()
# This state & following "Provision" states should not be used if provision_module() is not implemented.
MODULE_STATUS_PROVISION_READY = "ProvisionReady"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the vendor supposed to differentiate between MODULE_STATUS_PROVISION_PRESENT and MODULE_STATUS_PROVISION_READY ?
You cannot easily differentiate between a linecard that has been powered off and a linecard that you just inserted.
The mechanisms that I can think about would probably better live in the common infrastructure than the platform vendor API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are very similar states, but PRESENT would indicate the state where the platform API does not support module provisioning / the module isn't ready for provisioning. Having a separate state allows the provisioning flow to be opt-in.
Module detection is fundamentally platform specific - what would be the common mechanism for detecting this? All vendors would have to agree if we want to move this logic to the SONiC layer - my personal opinion is SONiC shouldnt mandate the implementation details for this.
I dont see why there would be difficulty in differentiating between a linecard which is powered off vs just inserted. I would expect the platform to be able to manage/monitor the power states of the modules & "presence" to be decoupled from power state.

This document aims to introduce a unified mechanism to achieve this.

## Requirements
- When new linecards are inserted into a chassis, the supervisor card running SONiC is responsible for detecting the presence of these new modules. It is up to the vendors to implement a mechanism to detect this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure vendors can implement a way to detect if a linecard is inserted but how would the daemon know about it if there is no common platform API to call?
For other components there is get_change_event
Short of doing this it means that the new daemon will have to use inefficient polling to see if something has changed. Having a platform API means that the platform vendor can either use its own polling logic or some event driven mechanism if they have one.

Copy link
Author

@liamkearney-msft liamkearney-msft Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the daemon knows because it's reported via get_oper_status() / the state change is pushed to STATE_DB. This is the common API which is already used. The LC is being polled by chassisd using this API already (see https://github.com/sonic-net/sonic-platform-daemons/blob/master/sonic-chassisd/scripts/chassisd#L369).
This new daemon listens to changes in oper_state in STATE_DB, which is published by chassisd. We dont have to poll in the new daemon as we can just subscribe to the table in redis. This doesnt introduce any more polling than what is already there.

The debate for having chassisd poll get_oper_status() vs having an event driven API to update the state reactively/dynamically is a whole other discussion, but this is simply leveraging the existing design & that change I think is out of scope (as it would require changes from everyone). In the future if we want to change the way module states are synced with STATE_DB, we can, and it shouldn't affect this new provisioning daemon. They've been decoupled deliberately.

@liamkearney-msft
Copy link
Author

Thanks for your comments @Staphylo. Ive added some responses to your questions. Let me know if these answer your concerns, and I can update the HLD to be more clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants