Skip to content

[thermalctld] Use platform API for PSU fan name#767

Open
marvellparthiv wants to merge 1 commit intosonic-net:masterfrom
marvellparthiv:b_psu_fan_name_update
Open

[thermalctld] Use platform API for PSU fan name#767
marvellparthiv wants to merge 1 commit intosonic-net:masterfrom
marvellparthiv:b_psu_fan_name_update

Conversation

@marvellparthiv
Copy link

@marvellparthiv marvellparthiv commented Mar 4, 2026

Fix PSU fan parent name mismatch between psud and thermalctld.

Description

psud was updated to use get_name() API for PSU keys in STATE_DB (#102), but thermalctld still used index-based 'PSU {index}' format. This inconsistency caused the SNMP subagent to fail when resolving the parent of PSU fans (e.g., PSU2_FAN1), as the parent name in FAN_INFO did not match any PSU key in PSU_INFO.

Add get_psu_key() method to FanUpdater that retrieves the PSU name via chassis.get_psu().get_name(), aligning with psud's behavior. Falls back to 'PSU {index+1}' on NotImplementedError or AttributeError (e.g., when get_psu() returns None for an out-of-range index).

Add unit tests for get_psu_key() covering:

  • Successful PSU name retrieval from platform API

  • AttributeError fallback when get_psu() returns None

  • NotImplementedError fallback when get_name() is not implemented

  • Fallback when chassis is None

Motivation and Context

The SNMP subagent resolves fan-to-parent relationships by looking up the parent name from FAN_INFO in PSU_INFO. After psud switched to using the platform API's get_name() for PSU keys, thermalctld's index-based naming ('PSU 1', 'PSU 2') no longer matched, breaking SNMP fan parent resolution for PSU fans. This change aligns thermalctld with psud so both daemons use consistent PSU naming in STATE_DB.

How Has This Been Tested?

  • Confirmed SNMP subagent can correctly resolve the parent of PSU fans (e.g., PSU2_FAN1) after the fix.

  • Added unit tests validating all get_psu_key() code paths including AttributeError, NotImplementedError, and None chassis fallback.

Additional Information (Optional)

  • This is a minimal change scoped to the FanUpdater class in thermalctld. No new dependencies are introduced. Platforms that do not implement get_name() on PSU objects are unaffected due to the fallback behavior.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@marvellparthiv marvellparthiv changed the title Use platform API for PSU fan name [thermalctld] Use platform API for PSU fan name Mar 4, 2026
Fix PSU fan parent name mismatch between psud and thermalctld.

psud was updated to use get_name() API for PSU keys in STATE_DB,
but thermalctld still used index-based 'PSU {index}' format. This
inconsistency caused the SNMP subagent to fail when resolving the
parent of PSU fans (e.g., PSU2_FAN1), as the parent name in
FAN_INFO did not match any PSU key in PSU_INFO.

Add get_psu_key() method to FanUpdater that retrieves the PSU name
via chassis.get_psu().get_name(), aligning with psud's behavior.
Falls back to 'PSU {index+1}' on NotImplementedError or
AttributeError (e.g., when get_psu() returns None for an
out-of-range index).

Add unit tests for get_psu_key() covering:
- Successful PSU name retrieval from platform API
- AttributeError fallback when get_psu() returns None
- NotImplementedError fallback when get_name() is not implemented
- Fallback when chassis is None

Signed-off-by: Parthiv Shah <parthiv@marvell.com>
@marvellparthiv marvellparthiv force-pushed the b_psu_fan_name_update branch from 7ff8128 to 64993ae Compare March 4, 2026 12:14
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aligns thermalctld’s PSU fan parent naming with psud by using the platform PSU get_name() API when generating the PSU “parent_name” key, preventing FAN_INFO → PSU_INFO parent resolution mismatches (notably impacting SNMP entity relationships).

Changes:

  • Add FanUpdater.get_psu_key() to derive PSU keys via chassis.get_psu(...).get_name() with fallback to legacy PSU {index+1}.
  • Update PSU fan parent naming in _refresh_fan_status() to use get_psu_key().
  • Add unit tests covering several get_psu_key() fallback paths.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
sonic-thermalctld/scripts/thermalctld Adds get_psu_key() and switches PSU fan parent_name to platform-derived PSU naming.
sonic-thermalctld/tests/test_thermalctld.py Adds unit tests for get_psu_key() behavior and fallbacks.
Comments suppressed due to low confidence (1)

sonic-thermalctld/scripts/thermalctld:325

  • _refresh_fan_status() now uses get_psu_key() for PSU parent_name. If get_psu_key() returns None or raises, this can produce invalid fan_name defaults (e.g., 'None fan 1') and/or prevent publishing PSU fan entries. Ensuring get_psu_key() always returns a valid string fallback will avoid corrupt/unstable keys in FAN_INFO/PHYSICAL_ENTITY_INFO.
        drawer_name = NOT_AVAILABLE if fan_type != FanType.DRAWER else str(try_get(parent.get_name))
        if fan_type == FanType.PSU:
            parent_name = self.get_psu_key(parent_index)
        elif fan_type == FanType.MODULE:
            parent_name = try_get(parent.get_name, default='Module {}'.format(parent_index + 1))
        else:
            parent_name = drawer_name if drawer_name != NOT_AVAILABLE else CHASSIS_INFO_KEY
        fan_name = try_get(fan.get_name, '{} fan {}'.format(parent_name, fan_index + 1))
        update_entity_info(self.phy_entity_table, parent_name, fan_name, fan, fan_index + 1)

Comment on lines +206 to +210
try:
return self.chassis.get_psu(psu_index).get_name()
except (NotImplementedError, AttributeError):
pass
return 'PSU {}'.format(psu_index + 1)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_psu_key() may raise IndexError/other exceptions from chassis.get_psu() or psu.get_name(), which will cause PSU fan updates to be skipped (caught by update() and logged) instead of cleanly falling back to the legacy key. Consider catching broader exceptions (e.g., IndexError/Exception) and also falling back when get_name() returns None/empty, so this helper always returns a usable string key.

Copilot uses AI. Check for mistakes.
Comment on lines +204 to +205
def get_psu_key(self, psu_index):
if self.chassis is not None:
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FanUpdater.get_psu_key() is a new public helper but it currently has no docstring, while the surrounding public methods in this file are documented. Please add a short docstring describing expected indexing (0-based vs 1-based) and fallback behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +292 to +322
def test_get_psu_key_returns_psu_name(self):
"""Test get_psu_key returns PSU name from platform API"""
chassis = MockChassis()
psu = MockPsu()
psu._name = 'PSU0'
chassis._psu_list.append(psu)
fan_updater = thermalctld.FanUpdater(chassis, threading.Event())
assert fan_updater.get_psu_key(0) == 'PSU0'

def test_get_psu_key_attribute_error(self):
"""Test get_psu_key falls back when get_psu returns None (AttributeError on .get_name())"""
chassis = MockChassis()
# No PSUs added, so get_psu(index) returns None and .get_name() raises AttributeError
fan_updater = thermalctld.FanUpdater(chassis, threading.Event())
assert fan_updater.get_psu_key(0) == 'PSU 1'
assert fan_updater.get_psu_key(1) == 'PSU 2'

def test_get_psu_key_not_implemented_error(self):
"""Test get_psu_key falls back to 1-based 'PSU <index+1>' when get_name raises NotImplementedError"""
chassis = MockChassis()
psu = MockPsu()
psu.get_name = mock.MagicMock(side_effect=NotImplementedError)
chassis._psu_list.append(psu)
fan_updater = thermalctld.FanUpdater(chassis, threading.Event())
assert fan_updater.get_psu_key(0) == 'PSU 1'

def test_get_psu_key_chassis_none(self):
"""Test get_psu_key falls back to 1-based 'PSU <index+1>' when chassis is None"""
fan_updater = thermalctld.FanUpdater(None, threading.Event())
assert fan_updater.get_psu_key(0) == 'PSU 1'

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new get_psu_key() tests cover AttributeError/NotImplementedError and chassis=None, but they don't cover cases where chassis.get_psu() (or get_name()) raises IndexError/Exception or where get_name() returns None. Adding tests for these paths will help ensure thermalctld remains resilient across platform implementations (similar to psud's get_psu_key coverage).

Copilot generated this review using guidance from repository custom instructions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants