Xcvrd Refactor 8/13: Refactor CMIS_STATE_AP_CONF logic into handle_cmis_ap_conf_state#749
Xcvrd Refactor 8/13: Refactor CMIS_STATE_AP_CONF logic into handle_cmis_ap_conf_state#749bobby-nexthop wants to merge 5 commits intosonic-net:masterfrom
Conversation
…function Signed-off-by: Bobby McGonigle <bobby@nexthop.ai>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Bobby McGonigle <bobby@nexthop.ai>
Signed-off-by: Bobby McGonigle <bobby@nexthop.ai>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@bobby-nexthop there are some conflict. also its in draft. |
|
@bobby-nexthop can you check?
|
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
a40545f to
5efceac
Compare
|
/azp run |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Bobby McGonigle <bobby@nexthop.ai>
23f9414 to
fa59593
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR continues the ongoing xcvrd refactor by extracting the CMIS CMIS_STATE_AP_CONF handling logic into a dedicated helper, keeping the CMIS state machine easier to read and maintain while preserving behavior.
Changes:
- Added
handle_cmis_ap_conf_state()to encapsulateCMIS_STATE_AP_CONFprocessing. - Replaced the inlined
CMIS_STATE_AP_CONFblock inprocess_cmis_state_machine()with a call to the new handler. - Added unit tests covering key
CMIS_STATE_AP_CONFbranches (module-ready timeout, datapath-deactivated timeout, laser frequency config, SI staging, DP init staging failure).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
sonic-xcvrd/xcvrd/cmis/cmis_manager_task.py |
Extracts CMIS_STATE_AP_CONF logic into handle_cmis_ap_conf_state() and wires it into the CMIS state machine. |
sonic-xcvrd/tests/test_xcvrd.py |
Adds new unit tests targeting handle_cmis_ap_conf_state() behaviors and error paths. |
|
|
||
| task.port_dict['Ethernet0']['cmis_expired'] = time.time() + 100 | ||
| task.port_dict['Ethernet0']['cmis_retries'] = 0 | ||
| task.port_dict['Ethernet0']['laser_freq'] = 0 |
There was a problem hiding this comment.
This test sets cmis_expired to time.time() + 100 (float), but cmis_expired is a datetime throughout CmisManagerTask (set via update_cmis_state_expiration_time() and checked via is_timer_expired()). Use a datetime value here to keep the test aligned with the code and prevent latent type errors.
| self.log_notice("{}: Apply Optics SI found for Vendor: {} PN: {} lane speed: {}G". | ||
| format(lport, api.get_manufacturer(), api.get_model(), lane_speed)) | ||
| if not api.stage_custom_si_settings(host_lanes_mask, optics_si_dict): | ||
| self.log_notice("{}: unable to stage custom SI settings ".format(lport)) |
There was a problem hiding this comment.
The log message includes a trailing space at the end of "unable to stage custom SI settings ", which can make log parsing/alerting brittle and causes inconsistent messages. Consider removing the trailing whitespace so the message is stable.
| self.log_notice("{}: unable to stage custom SI settings ".format(lport)) | |
| self.log_notice("{}: unable to stage custom SI settings".format(lport)) |
| task.port_dict['Ethernet0']['laser_freq'] = 193100 | ||
| task.port_dict['Ethernet0']['cmis_expired'] = time.time() + 100 | ||
| task.port_dict['Ethernet0']['cmis_retries'] = 0 |
There was a problem hiding this comment.
This test sets cmis_expired using time.time() + 100 (float), but CmisManagerTask.update_cmis_state_expiration_time()/is_timer_expired() use datetime objects for cmis_expired. To avoid type-mismatch surprises and keep the test consistent with production behavior, set cmis_expired to datetime.datetime.now() + datetime.timedelta(...) instead.
|
|
||
| task.port_dict['Ethernet0']['cmis_expired'] = time.time() + 100 | ||
| task.port_dict['Ethernet0']['cmis_retries'] = 0 |
There was a problem hiding this comment.
This test sets cmis_expired to time.time() + 100 (float), but the CMIS state machine stores/compares cmis_expired as a datetime (see update_cmis_state_expiration_time() / is_timer_expired()). Please use a datetime value here to keep the test aligned with the implementation and avoid latent type errors if a timeout path is exercised.
| def test_CmisManagerTask_handle_ap_conf_laser_frequency_failure(self, mock_helper_logger, mock_chassis, mock_get_status_sw_tbl): | ||
| """Test failed laser frequency configuration in handle_cmis_ap_conf_state | ||
| Verifies that error is logged when set_laser_freq fails""" | ||
| mock_xcvr_api = MagicMock() | ||
| mock_xcvr_api.get_module_state = MagicMock(return_value='ModuleReady') | ||
| mock_xcvr_api.get_datapath_state = MagicMock(return_value={ | ||
| 'DP1State': 'DataPathDeactivated', | ||
| 'DP2State': 'DataPathDeactivated', | ||
| 'DP3State': 'DataPathDeactivated', | ||
| 'DP4State': 'DataPathDeactivated', | ||
| 'DP5State': 'DataPathDeactivated', | ||
| 'DP6State': 'DataPathDeactivated', | ||
| 'DP7State': 'DataPathDeactivated', | ||
| 'DP8State': 'DataPathDeactivated' | ||
| }) | ||
| mock_xcvr_api.get_application_advertisement = MagicMock(return_value={1: {'host_lane_count': 8}}) | ||
| mock_xcvr_api.get_host_lane_assignment_option = MagicMock(return_value=1) | ||
| mock_xcvr_api.is_coherent_module = MagicMock(return_value=True) | ||
| mock_xcvr_api.get_tuning_in_progress = MagicMock(return_value=False) | ||
| mock_xcvr_api.set_laser_freq = MagicMock(return_value=-1) # Simulate failure | ||
| mock_xcvr_api.set_application = MagicMock() | ||
|
|
There was a problem hiding this comment.
optics_si_parser.optics_si_present() is not patched in this test. If another test has populated optics_si_parser.g_optics_si_dict (e.g., by calling load_optics_si_settings()), this test may unexpectedly enter the SI-staging path and become order-dependent/flaky. Patch optics_si_present to return False (or clear the global dict in setup) so this test stays isolated to the laser-frequency behavior it intends to cover.
| mock_xcvr_api = MagicMock() | ||
| mock_xcvr_api.get_module_state = MagicMock(return_value='ModuleReady') | ||
| mock_xcvr_api.get_datapath_state = MagicMock(return_value={ | ||
| 'DP1State': 'DataPathDeactivated', | ||
| 'DP2State': 'DataPathDeactivated', | ||
| 'DP3State': 'DataPathDeactivated', | ||
| 'DP4State': 'DataPathDeactivated', | ||
| 'DP5State': 'DataPathDeactivated', | ||
| 'DP6State': 'DataPathDeactivated', | ||
| 'DP7State': 'DataPathDeactivated', | ||
| 'DP8State': 'DataPathDeactivated' | ||
| }) | ||
| mock_xcvr_api.get_application_advertisement = MagicMock(return_value={1: {'host_lane_count': 8}}) | ||
| mock_xcvr_api.get_host_lane_assignment_option = MagicMock(return_value=1) | ||
| mock_xcvr_api.is_coherent_module = MagicMock(return_value=False) | ||
| mock_xcvr_api.set_application = MagicMock() | ||
| mock_xcvr_api.scs_apply_datapath_init = MagicMock(return_value=False) # Simulate failure | ||
|
|
||
| mock_sfp = MagicMock() | ||
| mock_sfp.get_presence = MagicMock(return_value=True) | ||
| mock_sfp.get_xcvr_api = MagicMock(return_value=mock_xcvr_api) | ||
|
|
||
| mock_chassis.get_all_sfps = MagicMock(return_value=[mock_sfp]) | ||
| mock_chassis.get_sfp = MagicMock(return_value=mock_sfp) | ||
|
|
||
| port_mapping = PortMapping() | ||
| port_mapping.handle_port_change_event(PortChangeEvent('Ethernet0', 1, 0, PortChangeEvent.PORT_ADD)) | ||
| stop_event = threading.Event() | ||
| task = CmisManagerTask(DEFAULT_NAMESPACE, port_mapping, stop_event, platform_chassis=mock_chassis) | ||
| task.xcvr_table_helper = XcvrTableHelper(DEFAULT_NAMESPACE) | ||
| task.xcvr_table_helper.get_status_sw_tbl.return_value = mock_get_status_sw_tbl | ||
| task.xcvr_table_helper.get_gearbox_line_lanes_dict = MagicMock(return_value={}) |
There was a problem hiding this comment.
optics_si_parser.optics_si_present() isn't patched in this test. Because optics_si_parser uses a module-level global dict, earlier tests that load optics SI settings can cause this test to take a different branch, making it order-dependent. Patch optics_si_present to return False (or reset optics_si_parser.g_optics_si_dict) to keep the test deterministic and focused on the scs_apply_datapath_init failure path.
| mock_xcvr_api = MagicMock() | |
| mock_xcvr_api.get_module_state = MagicMock(return_value='ModuleReady') | |
| mock_xcvr_api.get_datapath_state = MagicMock(return_value={ | |
| 'DP1State': 'DataPathDeactivated', | |
| 'DP2State': 'DataPathDeactivated', | |
| 'DP3State': 'DataPathDeactivated', | |
| 'DP4State': 'DataPathDeactivated', | |
| 'DP5State': 'DataPathDeactivated', | |
| 'DP6State': 'DataPathDeactivated', | |
| 'DP7State': 'DataPathDeactivated', | |
| 'DP8State': 'DataPathDeactivated' | |
| }) | |
| mock_xcvr_api.get_application_advertisement = MagicMock(return_value={1: {'host_lane_count': 8}}) | |
| mock_xcvr_api.get_host_lane_assignment_option = MagicMock(return_value=1) | |
| mock_xcvr_api.is_coherent_module = MagicMock(return_value=False) | |
| mock_xcvr_api.set_application = MagicMock() | |
| mock_xcvr_api.scs_apply_datapath_init = MagicMock(return_value=False) # Simulate failure | |
| mock_sfp = MagicMock() | |
| mock_sfp.get_presence = MagicMock(return_value=True) | |
| mock_sfp.get_xcvr_api = MagicMock(return_value=mock_xcvr_api) | |
| mock_chassis.get_all_sfps = MagicMock(return_value=[mock_sfp]) | |
| mock_chassis.get_sfp = MagicMock(return_value=mock_sfp) | |
| port_mapping = PortMapping() | |
| port_mapping.handle_port_change_event(PortChangeEvent('Ethernet0', 1, 0, PortChangeEvent.PORT_ADD)) | |
| stop_event = threading.Event() | |
| task = CmisManagerTask(DEFAULT_NAMESPACE, port_mapping, stop_event, platform_chassis=mock_chassis) | |
| task.xcvr_table_helper = XcvrTableHelper(DEFAULT_NAMESPACE) | |
| task.xcvr_table_helper.get_status_sw_tbl.return_value = mock_get_status_sw_tbl | |
| task.xcvr_table_helper.get_gearbox_line_lanes_dict = MagicMock(return_value={}) | |
| # Ensure optics_si_present() does not depend on global optics SI state | |
| # so this test remains deterministic and focused on the datapath init failure path. | |
| with patch('xcvrd.xcvrd_utilities.optics_si_parser.optics_si_present', return_value=False): | |
| mock_xcvr_api = MagicMock() | |
| mock_xcvr_api.get_module_state = MagicMock(return_value='ModuleReady') | |
| mock_xcvr_api.get_datapath_state = MagicMock(return_value={ | |
| 'DP1State': 'DataPathDeactivated', | |
| 'DP2State': 'DataPathDeactivated', | |
| 'DP3State': 'DataPathDeactivated', | |
| 'DP4State': 'DataPathDeactivated', | |
| 'DP5State': 'DataPathDeactivated', | |
| 'DP6State': 'DataPathDeactivated', | |
| 'DP7State': 'DataPathDeactivated', | |
| 'DP8State': 'DataPathDeactivated' | |
| }) | |
| mock_xcvr_api.get_application_advertisement = MagicMock(return_value={1: {'host_lane_count': 8}}) | |
| mock_xcvr_api.get_host_lane_assignment_option = MagicMock(return_value=1) | |
| mock_xcvr_api.is_coherent_module = MagicMock(return_value=False) | |
| mock_xcvr_api.set_application = MagicMock() | |
| mock_xcvr_api.scs_apply_datapath_init = MagicMock(return_value=False) # Simulate failure | |
| mock_sfp = MagicMock() | |
| mock_sfp.get_presence = MagicMock(return_value=True) | |
| mock_sfp.get_xcvr_api = MagicMock(return_value=mock_xcvr_api) | |
| mock_chassis.get_all_sfps = MagicMock(return_value=[mock_sfp]) | |
| mock_chassis.get_sfp = MagicMock(return_value=mock_sfp) | |
| port_mapping = PortMapping() | |
| port_mapping.handle_port_change_event(PortChangeEvent('Ethernet0', 1, 0, PortChangeEvent.PORT_ADD)) | |
| stop_event = threading.Event() | |
| task = CmisManagerTask(DEFAULT_NAMESPACE, port_mapping, stop_event, platform_chassis=mock_chassis) | |
| task.xcvr_table_helper = XcvrTableHelper(DEFAULT_NAMESPACE) | |
| task.xcvr_table_helper.get_status_sw_tbl.return_value = mock_get_status_sw_tbl | |
| task.xcvr_table_helper.get_gearbox_line_lanes_dict = MagicMock(return_value={}) |
|
|
||
| task.port_dict['Ethernet0']['cmis_expired'] = time.time() + 100 | ||
| task.port_dict['Ethernet0']['cmis_retries'] = 0 | ||
| task.port_dict['Ethernet0']['laser_freq'] = 0 |
There was a problem hiding this comment.
This test also sets cmis_expired using time.time() + 100 (float) while the implementation expects a datetime. Please switch to a datetime value to keep the test consistent and avoid future failures if a timeout branch is hit.
Description
This change moves CMIS processing for CMIS_STATE_AP_CONF state into its own function. It does not add any logic changes.
Motivation and Context
xcvrd has gotten to 4000 lines long. To make things easier, we'd like to refactor it. This is the second PR in a series that aims to do the following:
How Has This Been Tested?
Transceivers continue to get programmed correctly with links up, unit tests pass
Additional Information (Optional)