[chassis]: Add chassis provisioning HLD#2252
[chassis]: Add chassis provisioning HLD#2252liamkearney-msft wants to merge 1 commit intosonic-net:masterfrom
Conversation
|
/azp run |
|
No pipelines are associated with this pull request. |
Signed-off-by: Liam Kearney <liamkearney@microsoft.com>
71e974a to
e948e00
Compare
|
/azp run |
|
No pipelines are associated with this pull request. |
| ``` | ||
|
|
||
|
|
||
| ## New pmon daemon - sonic-provisiond |
There was a problem hiding this comment.
Is it necessary to have a separate daemon for this? Can this functionality not be folded into chassisd?
There was a problem hiding this comment.
Decided to go this way after a discussion with @arlakshm
Theres a few upsides of having it separate. If it's a separate daemon, disabling this feature becomes trivial. It also avoids creeping the scope for chassisd, and allows us to keep its role as "sync module state with statedb". It also simplifies keeping the state updated while conversion could potentially be running / allows us to block in provision_module() without having to block the chassisd thread (and avoid spinning out new threads in chassisd.)
So, yeah it could be folded in, but separating it out lets us avoid creeping the scope of chassisd & is more modular. If we are going through statedb anyway, there is no real need to strictly couple it with chassisd.
|
hi reviewers - PR for sonic-platform-common with API stubs / new states can be found here : sonic-net/sonic-platform-common#635 |
| # Module state when module is detected, is able to run SONiC, but is not yet running SONiC. | ||
| # Modules in this state will be attempted to be converted to SONiC via calls to module.provision_module() | ||
| # This state & following "Provision" states should not be used if provision_module() is not implemented. | ||
| MODULE_STATUS_PROVISION_READY = "ProvisionReady" |
There was a problem hiding this comment.
How is the vendor supposed to differentiate between MODULE_STATUS_PROVISION_PRESENT and MODULE_STATUS_PROVISION_READY ?
You cannot easily differentiate between a linecard that has been powered off and a linecard that you just inserted.
The mechanisms that I can think about would probably better live in the common infrastructure than the platform vendor API.
There was a problem hiding this comment.
They are very similar states, but PRESENT would indicate the state where the platform API does not support module provisioning / the module isn't ready for provisioning. Having a separate state allows the provisioning flow to be opt-in.
Module detection is fundamentally platform specific - what would be the common mechanism for detecting this? All vendors would have to agree if we want to move this logic to the SONiC layer - my personal opinion is SONiC shouldnt mandate the implementation details for this.
I dont see why there would be difficulty in differentiating between a linecard which is powered off vs just inserted. I would expect the platform to be able to manage/monitor the power states of the modules & "presence" to be decoupled from power state.
| This document aims to introduce a unified mechanism to achieve this. | ||
|
|
||
| ## Requirements | ||
| - When new linecards are inserted into a chassis, the supervisor card running SONiC is responsible for detecting the presence of these new modules. It is up to the vendors to implement a mechanism to detect this. |
There was a problem hiding this comment.
Sure vendors can implement a way to detect if a linecard is inserted but how would the daemon know about it if there is no common platform API to call?
For other components there is get_change_event
Short of doing this it means that the new daemon will have to use inefficient polling to see if something has changed. Having a platform API means that the platform vendor can either use its own polling logic or some event driven mechanism if they have one.
There was a problem hiding this comment.
the daemon knows because it's reported via get_oper_status() / the state change is pushed to STATE_DB. This is the common API which is already used. The LC is being polled by chassisd using this API already (see https://github.com/sonic-net/sonic-platform-daemons/blob/master/sonic-chassisd/scripts/chassisd#L369).
This new daemon listens to changes in oper_state in STATE_DB, which is published by chassisd. We dont have to poll in the new daemon as we can just subscribe to the table in redis. This doesnt introduce any more polling than what is already there.
The debate for having chassisd poll get_oper_status() vs having an event driven API to update the state reactively/dynamically is a whole other discussion, but this is simply leveraging the existing design & that change I think is out of scope (as it would require changes from everyone). In the future if we want to change the way module states are synced with STATE_DB, we can, and it shouldn't affect this new provisioning daemon. They've been decoupled deliberately.
|
Thanks for your comments @Staphylo. Ive added some responses to your questions. Let me know if these answer your concerns, and I can update the HLD to be more clear. |
Add HLD for automatic module provisioning.
Introduces a new API and module operational states + pmon daemon to facilitate this within the sonic layer.
.md link with formatting: https://github.com/liamkearney-msft/SONiC/blob/liam/chassis-autoprovision/doc/chassis/module-provisioning/chassis-linecard-provisioning-hld.md