Skip to content

Xcvrd Refactor 8/13: Refactor CMIS_STATE_AP_CONF logic into handle_cmis_ap_conf_state#749

Open
bobby-nexthop wants to merge 5 commits intosonic-net:masterfrom
bobby-nexthop:xcvrd-refactor-9
Open

Xcvrd Refactor 8/13: Refactor CMIS_STATE_AP_CONF logic into handle_cmis_ap_conf_state#749
bobby-nexthop wants to merge 5 commits intosonic-net:masterfrom
bobby-nexthop:xcvrd-refactor-9

Conversation

@bobby-nexthop
Copy link
Copy Markdown
Contributor

@bobby-nexthop bobby-nexthop commented Feb 12, 2026

Description

This change moves CMIS processing for CMIS_STATE_AP_CONF state into its own function. It does not add any logic changes.

  • Introduced handle_cmis_ap_conf_state() to encapsulate CMIS_STATE_AP_CONF processing.
  • Replaced the inlined CMIS_STATE_AP_CONF block in process_cmis_state_machine() with a call to the new handler.
  • Add new testcases to get required coverage - previously 66% missing lines 960-963,966-969,978,991-993,999-1001,1009-1011

Motivation and Context

xcvrd has gotten to 4000 lines long. To make things easier, we'd like to refactor it. This is the second PR in a series that aims to do the following:

Task PR
1) Move functions used across multiple files in xcvrd to utils/common.py #654
2) Move CmisManagerTask into cmis/cmis_manager_task.py #691
3) Split task_worker into process_single_lport #701
4) Move cmis logic out of process_single_lport #716
5) Add handle_cmis_inserted_state function #738
6) Add handle_cmis_dp_pre_init_check_state function #741
7) Add handle_cmis_dpdeinit_state function #748
8) Add handle_cmis_ap_conf_state function #749

How Has This Been Tested?

Transceivers continue to get programmed correctly with links up, unit tests pass

Additional Information (Optional)

bobby-nexthop and others added 3 commits February 11, 2026 20:05
…function

Signed-off-by: Bobby McGonigle <bobby@nexthop.ai>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Bobby McGonigle <bobby@nexthop.ai>
Signed-off-by: Bobby McGonigle <bobby@nexthop.ai>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@prgeor
Copy link
Copy Markdown
Collaborator

prgeor commented Feb 24, 2026

@bobby-nexthop there are some conflict. also its in draft.

@prgeor
Copy link
Copy Markdown
Collaborator

prgeor commented Mar 11, 2026

@bobby-nexthop can you check?

  1. PR in draft
  2. rebase and fix conflict

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Bobby McGonigle <bobby@nexthop.ai>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@bobby-nexthop bobby-nexthop marked this pull request as ready for review March 16, 2026 19:56
@mihirpat1 mihirpat1 requested a review from Copilot March 24, 2026 16:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues the ongoing xcvrd refactor by extracting the CMIS CMIS_STATE_AP_CONF handling logic into a dedicated helper, keeping the CMIS state machine easier to read and maintain while preserving behavior.

Changes:

  • Added handle_cmis_ap_conf_state() to encapsulate CMIS_STATE_AP_CONF processing.
  • Replaced the inlined CMIS_STATE_AP_CONF block in process_cmis_state_machine() with a call to the new handler.
  • Added unit tests covering key CMIS_STATE_AP_CONF branches (module-ready timeout, datapath-deactivated timeout, laser frequency config, SI staging, DP init staging failure).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
sonic-xcvrd/xcvrd/cmis/cmis_manager_task.py Extracts CMIS_STATE_AP_CONF logic into handle_cmis_ap_conf_state() and wires it into the CMIS state machine.
sonic-xcvrd/tests/test_xcvrd.py Adds new unit tests targeting handle_cmis_ap_conf_state() behaviors and error paths.

Comment on lines +6333 to +6336

task.port_dict['Ethernet0']['cmis_expired'] = time.time() + 100
task.port_dict['Ethernet0']['cmis_retries'] = 0
task.port_dict['Ethernet0']['laser_freq'] = 0
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test sets cmis_expired to time.time() + 100 (float), but cmis_expired is a datetime throughout CmisManagerTask (set via update_cmis_state_expiration_time() and checked via is_timer_expired()). Use a datetime value here to keep the test aligned with the code and prevent latent type errors.

Copilot uses AI. Check for mistakes.
self.log_notice("{}: Apply Optics SI found for Vendor: {} PN: {} lane speed: {}G".
format(lport, api.get_manufacturer(), api.get_model(), lane_speed))
if not api.stage_custom_si_settings(host_lanes_mask, optics_si_dict):
self.log_notice("{}: unable to stage custom SI settings ".format(lport))
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log message includes a trailing space at the end of "unable to stage custom SI settings ", which can make log parsing/alerting brittle and causes inconsistent messages. Consider removing the trailing whitespace so the message is stable.

Suggested change
self.log_notice("{}: unable to stage custom SI settings ".format(lport))
self.log_notice("{}: unable to stage custom SI settings".format(lport))

Copilot uses AI. Check for mistakes.
Comment on lines +6065 to +6067
task.port_dict['Ethernet0']['laser_freq'] = 193100
task.port_dict['Ethernet0']['cmis_expired'] = time.time() + 100
task.port_dict['Ethernet0']['cmis_retries'] = 0
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test sets cmis_expired using time.time() + 100 (float), but CmisManagerTask.update_cmis_state_expiration_time()/is_timer_expired() use datetime objects for cmis_expired. To avoid type-mismatch surprises and keep the test consistent with production behavior, set cmis_expired to datetime.datetime.now() + datetime.timedelta(...) instead.

Copilot uses AI. Check for mistakes.
Comment on lines +6160 to +6162

task.port_dict['Ethernet0']['cmis_expired'] = time.time() + 100
task.port_dict['Ethernet0']['cmis_retries'] = 0
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test sets cmis_expired to time.time() + 100 (float), but the CMIS state machine stores/compares cmis_expired as a datetime (see update_cmis_state_expiration_time() / is_timer_expired()). Please use a datetime value here to keep the test aligned with the implementation and avoid latent type errors if a timeout path is exercised.

Copilot uses AI. Check for mistakes.
Comment on lines +6007 to +6028
def test_CmisManagerTask_handle_ap_conf_laser_frequency_failure(self, mock_helper_logger, mock_chassis, mock_get_status_sw_tbl):
"""Test failed laser frequency configuration in handle_cmis_ap_conf_state
Verifies that error is logged when set_laser_freq fails"""
mock_xcvr_api = MagicMock()
mock_xcvr_api.get_module_state = MagicMock(return_value='ModuleReady')
mock_xcvr_api.get_datapath_state = MagicMock(return_value={
'DP1State': 'DataPathDeactivated',
'DP2State': 'DataPathDeactivated',
'DP3State': 'DataPathDeactivated',
'DP4State': 'DataPathDeactivated',
'DP5State': 'DataPathDeactivated',
'DP6State': 'DataPathDeactivated',
'DP7State': 'DataPathDeactivated',
'DP8State': 'DataPathDeactivated'
})
mock_xcvr_api.get_application_advertisement = MagicMock(return_value={1: {'host_lane_count': 8}})
mock_xcvr_api.get_host_lane_assignment_option = MagicMock(return_value=1)
mock_xcvr_api.is_coherent_module = MagicMock(return_value=True)
mock_xcvr_api.get_tuning_in_progress = MagicMock(return_value=False)
mock_xcvr_api.set_laser_freq = MagicMock(return_value=-1) # Simulate failure
mock_xcvr_api.set_application = MagicMock()

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optics_si_parser.optics_si_present() is not patched in this test. If another test has populated optics_si_parser.g_optics_si_dict (e.g., by calling load_optics_si_settings()), this test may unexpectedly enter the SI-staging path and become order-dependent/flaky. Patch optics_si_present to return False (or clear the global dict in setup) so this test stays isolated to the laser-frequency behavior it intends to cover.

Copilot uses AI. Check for mistakes.
Comment on lines +6280 to +6311
mock_xcvr_api = MagicMock()
mock_xcvr_api.get_module_state = MagicMock(return_value='ModuleReady')
mock_xcvr_api.get_datapath_state = MagicMock(return_value={
'DP1State': 'DataPathDeactivated',
'DP2State': 'DataPathDeactivated',
'DP3State': 'DataPathDeactivated',
'DP4State': 'DataPathDeactivated',
'DP5State': 'DataPathDeactivated',
'DP6State': 'DataPathDeactivated',
'DP7State': 'DataPathDeactivated',
'DP8State': 'DataPathDeactivated'
})
mock_xcvr_api.get_application_advertisement = MagicMock(return_value={1: {'host_lane_count': 8}})
mock_xcvr_api.get_host_lane_assignment_option = MagicMock(return_value=1)
mock_xcvr_api.is_coherent_module = MagicMock(return_value=False)
mock_xcvr_api.set_application = MagicMock()
mock_xcvr_api.scs_apply_datapath_init = MagicMock(return_value=False) # Simulate failure

mock_sfp = MagicMock()
mock_sfp.get_presence = MagicMock(return_value=True)
mock_sfp.get_xcvr_api = MagicMock(return_value=mock_xcvr_api)

mock_chassis.get_all_sfps = MagicMock(return_value=[mock_sfp])
mock_chassis.get_sfp = MagicMock(return_value=mock_sfp)

port_mapping = PortMapping()
port_mapping.handle_port_change_event(PortChangeEvent('Ethernet0', 1, 0, PortChangeEvent.PORT_ADD))
stop_event = threading.Event()
task = CmisManagerTask(DEFAULT_NAMESPACE, port_mapping, stop_event, platform_chassis=mock_chassis)
task.xcvr_table_helper = XcvrTableHelper(DEFAULT_NAMESPACE)
task.xcvr_table_helper.get_status_sw_tbl.return_value = mock_get_status_sw_tbl
task.xcvr_table_helper.get_gearbox_line_lanes_dict = MagicMock(return_value={})
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optics_si_parser.optics_si_present() isn't patched in this test. Because optics_si_parser uses a module-level global dict, earlier tests that load optics SI settings can cause this test to take a different branch, making it order-dependent. Patch optics_si_present to return False (or reset optics_si_parser.g_optics_si_dict) to keep the test deterministic and focused on the scs_apply_datapath_init failure path.

Suggested change
mock_xcvr_api = MagicMock()
mock_xcvr_api.get_module_state = MagicMock(return_value='ModuleReady')
mock_xcvr_api.get_datapath_state = MagicMock(return_value={
'DP1State': 'DataPathDeactivated',
'DP2State': 'DataPathDeactivated',
'DP3State': 'DataPathDeactivated',
'DP4State': 'DataPathDeactivated',
'DP5State': 'DataPathDeactivated',
'DP6State': 'DataPathDeactivated',
'DP7State': 'DataPathDeactivated',
'DP8State': 'DataPathDeactivated'
})
mock_xcvr_api.get_application_advertisement = MagicMock(return_value={1: {'host_lane_count': 8}})
mock_xcvr_api.get_host_lane_assignment_option = MagicMock(return_value=1)
mock_xcvr_api.is_coherent_module = MagicMock(return_value=False)
mock_xcvr_api.set_application = MagicMock()
mock_xcvr_api.scs_apply_datapath_init = MagicMock(return_value=False) # Simulate failure
mock_sfp = MagicMock()
mock_sfp.get_presence = MagicMock(return_value=True)
mock_sfp.get_xcvr_api = MagicMock(return_value=mock_xcvr_api)
mock_chassis.get_all_sfps = MagicMock(return_value=[mock_sfp])
mock_chassis.get_sfp = MagicMock(return_value=mock_sfp)
port_mapping = PortMapping()
port_mapping.handle_port_change_event(PortChangeEvent('Ethernet0', 1, 0, PortChangeEvent.PORT_ADD))
stop_event = threading.Event()
task = CmisManagerTask(DEFAULT_NAMESPACE, port_mapping, stop_event, platform_chassis=mock_chassis)
task.xcvr_table_helper = XcvrTableHelper(DEFAULT_NAMESPACE)
task.xcvr_table_helper.get_status_sw_tbl.return_value = mock_get_status_sw_tbl
task.xcvr_table_helper.get_gearbox_line_lanes_dict = MagicMock(return_value={})
# Ensure optics_si_present() does not depend on global optics SI state
# so this test remains deterministic and focused on the datapath init failure path.
with patch('xcvrd.xcvrd_utilities.optics_si_parser.optics_si_present', return_value=False):
mock_xcvr_api = MagicMock()
mock_xcvr_api.get_module_state = MagicMock(return_value='ModuleReady')
mock_xcvr_api.get_datapath_state = MagicMock(return_value={
'DP1State': 'DataPathDeactivated',
'DP2State': 'DataPathDeactivated',
'DP3State': 'DataPathDeactivated',
'DP4State': 'DataPathDeactivated',
'DP5State': 'DataPathDeactivated',
'DP6State': 'DataPathDeactivated',
'DP7State': 'DataPathDeactivated',
'DP8State': 'DataPathDeactivated'
})
mock_xcvr_api.get_application_advertisement = MagicMock(return_value={1: {'host_lane_count': 8}})
mock_xcvr_api.get_host_lane_assignment_option = MagicMock(return_value=1)
mock_xcvr_api.is_coherent_module = MagicMock(return_value=False)
mock_xcvr_api.set_application = MagicMock()
mock_xcvr_api.scs_apply_datapath_init = MagicMock(return_value=False) # Simulate failure
mock_sfp = MagicMock()
mock_sfp.get_presence = MagicMock(return_value=True)
mock_sfp.get_xcvr_api = MagicMock(return_value=mock_xcvr_api)
mock_chassis.get_all_sfps = MagicMock(return_value=[mock_sfp])
mock_chassis.get_sfp = MagicMock(return_value=mock_sfp)
port_mapping = PortMapping()
port_mapping.handle_port_change_event(PortChangeEvent('Ethernet0', 1, 0, PortChangeEvent.PORT_ADD))
stop_event = threading.Event()
task = CmisManagerTask(DEFAULT_NAMESPACE, port_mapping, stop_event, platform_chassis=mock_chassis)
task.xcvr_table_helper = XcvrTableHelper(DEFAULT_NAMESPACE)
task.xcvr_table_helper.get_status_sw_tbl.return_value = mock_get_status_sw_tbl
task.xcvr_table_helper.get_gearbox_line_lanes_dict = MagicMock(return_value={})

Copilot uses AI. Check for mistakes.
Comment on lines +6255 to +6258

task.port_dict['Ethernet0']['cmis_expired'] = time.time() + 100
task.port_dict['Ethernet0']['cmis_retries'] = 0
task.port_dict['Ethernet0']['laser_freq'] = 0
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test also sets cmis_expired using time.time() + 100 (float) while the implementation expects a datetime. Please switch to a datetime value to keep the test consistent and avoid future failures if a timeout branch is hit.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants