Skip to content

Separate FW version read API for Hisense 2x100G AOC cable + enable image validity check for dual bank modules#631

Open
pnakka28 wants to merge 6 commits intosonic-net:masterfrom
pnakka28:singlebank_fw
Open

Separate FW version read API for Hisense 2x100G AOC cable + enable image validity check for dual bank modules#631
pnakka28 wants to merge 6 commits intosonic-net:masterfrom
pnakka28:singlebank_fw

Conversation

@pnakka28
Copy link

@pnakka28 pnakka28 commented Mar 10, 2026

Description

Revert single-bank firmware handling from the default CMIS get_module_fw_info() and add a vendor-specific override for single-bank modules. This change moves single-bank handling out of the default CMIS API into a vendor-specific override and fixes incorrect inactive firmware version reporting for dual-bank modules during firmware upgrades.

Changes:

  • Add CmisAocSingleBankApi class that overrides get_module_fw_info() to handle single-bank modules
  • Add tests covering all firmware info scenarios and factory routing

Motivation and Context

While the previous implementation worked correctly for single-bank modules, it caused incorrect inactive firmware reporting for dual-bank modules during firmware upgrades. When a firmware download was interrupted or an image on the inactive bank was invalid, the inactive firmware version would still be reported from EEPROM instead of N/A.

To fix this, the single-bank handling code in the default CMIS API has been reverted so that dual-bank modules correctly report N/A for invalid images. A separate vendor-specific override has been added for single-bank modules, where the single-bank firmware handling is actually needed. This approach ensures:

  • Dual-bank modules report firmware info correctly during upgrades
  • Single-bank modules continue to work with inactive firmware read from EEPROM

How Has This Been Tested?

  • Verified sfputil show fwversion on single-bank modules shows inactive firmware read from EEPROM
  • Verified sfputil show fwversion on dual-bank CMIS modules still shows both Image A and Image B correctly
  • Verified dual-bank module with invalid image after interrupted firmware download reports inactive firmware as N/A
  • Verified xcvrd logs show no tracebacks or exceptions
  • Verified the DB is getting populated with the correct firmware versions

Additional Information (Optional)

MSFT ADO - 29333553

…age validity check for dual bank modules

- Add CmisAocSingleBankApi class for single-bank modules
- Override get_module_fw_info() to handle single-bank
- Add vendor routing for Hisense modules
- Add tests for single-bank firmware info and factory routing

Signed-off-by: Pavan Kalyan Nakka <pnakka@microsoft.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts CMIS firmware version reporting so dual-bank modules no longer report an “inactive” version when the inactive image is invalid, while preserving the needed single-bank behavior via a Hisense-specific CMIS API override routed through XcvrApiFactory.

Changes:

  • Revert single-bank inactive FW fallback in the default CMIS get_module_fw_info() so invalid images correctly yield N/A on dual-bank modules.
  • Add CmisAocSingleBankApi (Hisense 2x100G AOC) to read inactive FW from EEPROM for single-bank modules.
  • Add factory routing + unit tests intended to cover the new vendor override path.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
sonic_platform_base/sonic_xcvr/api/public/cmis.py Restores dual-bank semantics by always using Image A/B version (which becomes N/A when invalid) for inactive FW.
sonic_platform_base/sonic_xcvr/api/hisense/aoc_2x100g.py Introduces Hisense single-bank override that reads inactive FW from EEPROM.
sonic_platform_base/sonic_xcvr/xcvr_api_factory.py Routes Hisense DEF8504-2x100G part numbers to the single-bank API implementation.
tests/sonic_xcvr/test_xcvr_api_factory.py Adds a Hisense routing test (currently not asserting routing correctness).
tests/sonic_xcvr/test_aoc_2x100g.py Adds unit tests for the new Hisense single-bank get_module_fw_info() override (assertions are currently too weak).
setup.py Ensures the new sonic_platform_base.sonic_xcvr.api.hisense package is included in the distribution.

Signed-off-by: Pavan Kalyan Nakka <pnakka@microsoft.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

logger = logging.getLogger(__name__)

class CmisAocSingleBankApi(CmisApi):
def get_module_fw_info(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pnakka28 can we use CdbFwHandler ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor get_module_fw_info() is the method called by sfputil show fwversion and is also called internally by get_transceiver_info_firmware_versions() in cmis.py. If we override get_firmware_info(), all these calls would still use the base get_module_fw_info() (current) which reads Image B from invalid CDB registers on single-bank modules. Overriding get_module_fw_info() ensures correct single-bank behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pnakka28 please fix as discussed offline.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor fixed to use CdbFwHandler

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 4 comments.


You can also share your feedback on Copilot code review. Take the survey.

@prgeor
Copy link
Collaborator

prgeor commented Mar 16, 2026

@pnakka28 please address copilot review

- Replaced CDB byte parsing with CdbFwHandler.get_firmware_info()
- Updated tests to mock CdbFwHandler instead of raw CDB responses

Signed-off-by: Pavan Kalyan Nakka <pnakka@microsoft.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.


You can also share your feedback on Copilot code review. Take the survey.

RunningImage = 'A'
ActiveFirmware = ImageA
# In case of single bank module, inactive firmware version can be read from EEPROM
InactiveFirmware = self.get_module_inactive_firmware() + ".0"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_module_inactive_firmware() reads from EEPROM registers. This code path only executes when ImageA is running, so the module is running with CDB support not a flat memory. EEPROM registers will always return valid values here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pnakka28 "EEPROM registers will always return valid values here." -- is this correct even if there are i2c read errors?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor i2c read errors could cause get_module_inactive_firmware() to return N/A. will add guard to append for only valid string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor added guard for read errors

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.


You can also share your feedback on Copilot code review. Take the survey.

Signed-off-by: Pavan Kalyan Nakka <pnakka@microsoft.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

mihirpat1
mihirpat1 previously approved these changes Mar 19, 2026
if fw_info is False or fw_info is None:
return {'status': False, 'info': "Failed to get firmware info", 'result': None}

fw_status = fw_info.get(cdb_consts.CDB1_FIRMWARE_STATUS, {})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pnakka28 this decoding logic is common even for non-Hisense module. Can you fix the common

if status == 1 and self.cdb.cdb_chkcode(rpl) == rpl_chkcode:
and override what is needed for single bank?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor fixed the decoding logic for common

- Refactor to use CdbFwHandler for parsing firmware info instead of decoding raw CDB reply bytes
- Added subclass single-bank firmware modules that read inactive firmware from EEPROM

Signed-off-by: Pavan Kalyan Nakka <pnakka@microsoft.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@prgeor
Copy link
Collaborator

prgeor commented Mar 24, 2026

@copilot review the code

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.

Comment on lines +1912 to +1916
self._cdb_fw_hdlr = self._create_cdb_fw_handler()
if self._cdb_fw_hdlr is None:
return {'status': False, 'info': "CDB FW handler init failed", 'result': None}

fw_info = self._cdb_fw_hdlr.get_firmware_info()
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_module_fw_info() now instantiates CdbFwHandler and calls get_firmware_info(), but _create_cdb_fw_handler() constructs CdbFwHandler which asserts inside __init__ when initFwHandler() fails (e.g., module busy / CDB failure). Since this method doesn’t catch exceptions, callers can get an unhandled AssertionError instead of a clean {'status': False, ...} response. Wrap handler creation / firmware-info reads in exception handling (and avoid relying on assert for runtime control flow) so get_module_fw_info() fails gracefully.

Suggested change
self._cdb_fw_hdlr = self._create_cdb_fw_handler()
if self._cdb_fw_hdlr is None:
return {'status': False, 'info': "CDB FW handler init failed", 'result': None}
fw_info = self._cdb_fw_hdlr.get_firmware_info()
try:
self._cdb_fw_hdlr = self._create_cdb_fw_handler()
except AssertionError as e:
logger.error("CDB FW handler init assertion failed: %s", e)
return {'status': False, 'info': "CDB FW handler init failed", 'result': None}
except Exception as e:
logger.exception("Unexpected error during CDB FW handler init: %s", e)
return {'status': False, 'info': "CDB FW handler init failed", 'result': None}
if self._cdb_fw_hdlr is None:
return {'status': False, 'info': "CDB FW handler init failed", 'result': None}
try:
fw_info = self._cdb_fw_hdlr.get_firmware_info()
except Exception as e:
logger.exception("Exception while getting firmware info: %s", e)
return {'status': False, 'info': "Failed to get firmware info", 'result': None}

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pnakka28 Can you please look into this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mihirpat1 addressed this comment

return {'status': False, 'info': "CDB FW handler init failed", 'result': None}

fw_info = self._cdb_fw_hdlr.get_firmware_info()
if fw_info is False or fw_info is None:
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous get_module_fw_info() logic retried on the CDB “need to enter password” status (0x46) by calling module_enter_password() and re-issuing the command. With the new _cdb_fw_hdlr path, password-required cases will just return “Failed to get firmware info” (and may trip the handler init assert). Consider preserving the password-handling behavior either in CdbFwHandler or in get_module_fw_info() so modules requiring password don’t regress.

Suggested change
if fw_info is False or fw_info is None:
if fw_info is False or fw_info is None:
# Attempt to handle password-required cases by entering the module password
enter_password = getattr(self, 'module_enter_password', None)
if callable(enter_password):
# If password entry fails explicitly, propagate failure
if enter_password() is False:
return {'status': False, 'info': "Failed to enter module password", 'result': None}
# Retry fetching firmware info after entering password
fw_info = self._cdb_fw_hdlr.get_firmware_info()
if fw_info is False or fw_info is None:

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor Do we still need to preserve the password entering logic or okay to remove it?

Signed-off-by: Pavan Kalyan Nakka <pnakka@microsoft.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants