Skip to content

Commit a40545f

Browse files
committed
Merge origin/master into xcvrd-refactor-9
- Integrated fast-reboot caching feature from master - Updated is_fast_reboot_enabled() calls to use new cached method - Preserved refactored CMIS state handler functions
2 parents 5efceac + 7d8007d commit a40545f

22 files changed

Lines changed: 2321 additions & 359 deletions

File tree

.github/copilot-instructions.md

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# Copilot Instructions for sonic-platform-daemons
2+
3+
## Project Overview
4+
5+
sonic-platform-daemons contains the platform monitoring daemons for SONiC. These daemons run as services on the switch, continuously monitoring hardware components (fans, PSUs, thermals, transceivers, LEDs, PCIe, storage) and publishing their state to the SONiC Redis databases. They consume the platform APIs defined in sonic-platform-common.
6+
7+
## Architecture
8+
9+
```
10+
sonic-platform-daemons/
11+
├── sonic-xcvrd/ # Transceiver daemon (SFP/QSFP monitoring)
12+
│ ├── xcvrd/
13+
│ │ ├── xcvrd.py # Main transceiver monitoring daemon
14+
│ │ └── ...
15+
│ ├── tests/ # xcvrd tests
16+
│ └── setup.py
17+
├── sonic-psud/ # PSU daemon (power supply monitoring)
18+
│ ├── scripts/psud
19+
│ ├── tests/
20+
│ └── setup.py
21+
├── sonic-thermalctld/ # Thermal control daemon
22+
│ ├── scripts/thermalctld
23+
│ ├── tests/
24+
│ └── setup.py
25+
├── sonic-ledd/ # LED daemon
26+
│ ├── scripts/ledd
27+
│ ├── tests/
28+
│ └── setup.py
29+
├── sonic-pcied/ # PCIe monitoring daemon
30+
│ ├── scripts/pcied
31+
│ ├── tests/
32+
│ └── setup.py
33+
├── sonic-syseepromd/ # System EEPROM daemon
34+
│ ├── scripts/syseepromd
35+
│ ├── tests/
36+
│ └── setup.py
37+
├── sonic-chassisd/ # Chassis daemon (modular chassis)
38+
│ ├── scripts/chassisd
39+
│ ├── tests/
40+
│ └── setup.py
41+
├── sonic-ycabled/ # Y-cable daemon (dual-ToR)
42+
│ ├── ycable/
43+
│ ├── tests/
44+
│ └── setup.py
45+
├── sonic-sensormond/ # Sensor monitoring daemon
46+
├── sonic-stormond/ # Storage monitoring daemon
47+
└── .github/ # GitHub configuration
48+
```
49+
50+
### Key Concepts
51+
- **Each daemon is a standalone Python package** with its own `setup.py` and tests
52+
- **Platform API consumer**: Daemons call `sonic-platform-common` base class methods
53+
- **DB publishers**: Daemons write hardware state to STATE_DB, update COUNTERS_DB
54+
- **Event-driven monitoring**: Daemons poll hardware at intervals, detect state changes
55+
56+
## Language & Style
57+
58+
- **Primary language**: Python 3
59+
- **Indentation**: 4 spaces
60+
- **Naming conventions**:
61+
- Daemon scripts: lowercase (e.g., `psud`, `thermalctld`, `xcvrd`)
62+
- Functions: `snake_case`
63+
- Classes: `PascalCase`
64+
- Constants: `UPPER_CASE`
65+
- **Logging**: Use `sonic_py_common.daemon_base.Logger`
66+
- **Docstrings**: Required for public methods
67+
68+
## Build Instructions
69+
70+
```bash
71+
# Each daemon is a separate Python package
72+
cd sonic-xcvrd
73+
python3 setup.py bdist_wheel
74+
75+
cd sonic-psud
76+
python3 setup.py bdist_wheel
77+
# etc.
78+
```
79+
80+
## Testing
81+
82+
```bash
83+
# Run tests for a specific daemon
84+
cd sonic-xcvrd
85+
pytest tests/ -v
86+
87+
cd sonic-psud
88+
pytest tests/ -v
89+
90+
# Run with coverage
91+
pytest tests/ --cov --cov-report=term-missing
92+
```
93+
94+
- Each daemon has its own `tests/` directory
95+
- Tests use **pytest** with mock objects
96+
- Platform APIs are mocked (no real hardware needed)
97+
- Tests verify state machine logic, DB updates, and error handling
98+
99+
## PR Guidelines
100+
101+
- **Commit format**: `[daemon]: Description` (e.g., `[xcvrd]: Add DOM monitoring support`)
102+
- **Signed-off-by**: REQUIRED (`git commit -s`)
103+
- **CLA**: Sign Linux Foundation EasyCLA
104+
- **Testing**: All changes must include or update unit tests
105+
- **Platform compatibility**: Changes must not break any vendor platform
106+
- **DB schema**: Document any STATE_DB table changes
107+
108+
## Common Patterns
109+
110+
### Daemon Structure
111+
```python
112+
from sonic_py_common.daemon_base import DaemonBase
113+
114+
class MyDaemon(DaemonBase):
115+
def __init__(self):
116+
super().__init__('mydaemon')
117+
self.platform_chassis = load_platform_chassis()
118+
119+
def run(self):
120+
while True:
121+
# Monitor hardware
122+
status = self.platform_chassis.get_fan(0).get_status()
123+
# Update DB
124+
self.state_db.set('FAN_INFO|FAN0', {'status': str(status)})
125+
time.sleep(self.polling_interval)
126+
```
127+
128+
### DB Update Pattern
129+
```python
130+
# Daemons typically write to STATE_DB
131+
# Table format: TABLE_NAME|KEY
132+
# Fields: key-value pairs representing hardware state
133+
fvs = swsscommon.FieldValuePairs([
134+
('status', 'OK'),
135+
('speed', '12000'),
136+
('temperature', '35.0')
137+
])
138+
tbl.set(key, fvs)
139+
```
140+
141+
## Dependencies
142+
143+
- **sonic-platform-common**: Platform base classes (sonic_platform_base)
144+
- **sonic-py-common**: Common Python utilities, DaemonBase
145+
- **python-swsscommon**: Redis database bindings
146+
- **sonic-buildimage**: Packages are built within the buildimage system
147+
148+
## Gotchas
149+
150+
- **Platform plugin loading**: Daemons dynamically load vendor platform plugins — handle `ImportError` gracefully
151+
- **Polling intervals**: Too-frequent polling wastes CPU; too-infrequent misses events
152+
- **Error resilience**: Daemons must not crash on platform API errors — catch, log, continue
153+
- **Signal handling**: Daemons must handle SIGTERM/SIGINT for clean shutdown
154+
- **Multi-ASIC**: xcvrd and other daemons must be namespace-aware for multi-ASIC platforms
155+
- **DB consistency**: Always update all related fields atomically
156+
- **Warm restart**: Consider state preservation during warm restart scenarios
157+
- **Resource leaks**: Ensure file handles and DB connections are properly closed

azure-pipelines.yml

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,22 @@ jobs:
8787
artifact: 'sonic-buildimage.vs'
8888
runVersion: 'latestFromBranch'
8989
runBranch: "$(sourceBranch)"
90+
path: $(Build.ArtifactStagingDirectory)/download
91+
patterns: |
92+
target/debs/bookworm/libnl-3-200_*.deb
93+
target/debs/bookworm/libnl-genl-3-200_*.deb
94+
target/debs/bookworm/libnl-route-3-200_*.deb
95+
target/debs/bookworm/libnl-nf-3-200_*.deb
96+
target/debs/bookworm/libyang_1.0.73_amd64.deb
97+
target/debs/bookworm/libswsscommon_1.0.0_amd64.deb
98+
target/debs/bookworm/python3-swsscommon_1.0.0_amd64.deb
99+
100+
target/python-wheels/bookworm/swsssdk-2.0.1-py3-none-any.whl
101+
target/python-wheels/bookworm/sonic_py_common-1.0-py3-none-any.whl
102+
target/python-wheels/bookworm/sonic_yang_mgmt-1.0-py3-none-any.whl
103+
target/python-wheels/bookworm/sonic_yang_models-1.0-py3-none-any.whl
104+
target/python-wheels/bookworm/sonic_config_engine-1.0-py3-none-any.whl
105+
target/python-wheels/bookworm/sonic_platform_common-1.0-py3-none-any.whl
90106
displayName: "Download artifacts from latest sonic-buildimage build"
91107

92108
- script: |
@@ -99,7 +115,7 @@ jobs:
99115
sudo dpkg -i libyang_1.0.73_amd64.deb
100116
sudo dpkg -i libswsscommon_1.0.0_amd64.deb
101117
sudo dpkg -i python3-swsscommon_1.0.0_amd64.deb
102-
workingDirectory: $(Pipeline.Workspace)/target/debs/bookworm/
118+
workingDirectory: $(Build.ArtifactStagingDirectory)/download/target/debs/bookworm/
103119
displayName: 'Install Debian dependencies'
104120
105121
- script: |
@@ -110,7 +126,7 @@ jobs:
110126
sudo pip3 install sonic_yang_models-1.0-py3-none-any.whl
111127
sudo pip3 install sonic_config_engine-1.0-py3-none-any.whl
112128
sudo pip3 install sonic_platform_common-1.0-py3-none-any.whl
113-
workingDirectory: $(Pipeline.Workspace)/target/python-wheels/bookworm/
129+
workingDirectory: $(Build.ArtifactStagingDirectory)/download/target/python-wheels/bookworm/
114130
displayName: 'Install Python dependencies'
115131
116132
- script: |

sonic-chassisd/scripts/chassis_db_init

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,12 +41,12 @@ NOT_AVAILABLE = 'N/A'
4141
# Helper functions =============================================================
4242
#
4343

44-
# try get information from platform API and return a default value if caught NotImplementedError
44+
# try get information from platform API and return a default value if caught NotImplementedError or TimeoutError
4545

4646

4747
def try_get(callback, *args, **kwargs):
4848
"""
49-
Handy function to invoke the callback and catch NotImplementedError
49+
Handy function to invoke the callback and catch NotImplementedError or TimeoutError
5050
:param callback: Callback to be invoked
5151
:param args: Arguments to be passed to callback
5252
:param kwargs: Default return value if exception occur
@@ -57,7 +57,7 @@ def try_get(callback, *args, **kwargs):
5757
ret = callback(*args)
5858
if ret is None:
5959
ret = default
60-
except NotImplementedError:
60+
except (NotImplementedError, TimeoutError):
6161
ret = default
6262

6363
return ret

sonic-chassisd/scripts/chassisd

Lines changed: 2 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,6 @@ INVALID_IP = '0.0.0.0'
102102
CHASSIS_MODULE_ADMIN_STATUS = 'admin_status'
103103
MODULE_ADMIN_DOWN = 0
104104
MODULE_ADMIN_UP = 1
105-
MODULE_PRE_SHUTDOWN = 2
106105
MODULE_REBOOT_CAUSE_DIR = "/host/reboot-cause/module/"
107106
MAX_HISTORY_FILES = 10
108107

@@ -1359,12 +1358,6 @@ class ChassisdDaemon(daemon_base.DaemonBase):
13591358

13601359
def submit_dpu_callback(self, module_index, admin_state):
13611360
module = self.module_updater.chassis.get_module(module_index)
1362-
1363-
# This is only valid on platforms which have pci_detach and sensord changes required. If it is not implemented,
1364-
# there are no actions taken during this function execution.
1365-
if admin_state == MODULE_PRE_SHUTDOWN:
1366-
try_get(module.module_pre_shutdown, default=False)
1367-
# Set admin_state change in progress using the centralized method
13681361
if admin_state == MODULE_ADMIN_DOWN:
13691362
try_get(module.set_admin_state_gracefully, admin_state, default=False)
13701363

@@ -1387,10 +1380,7 @@ class ChassisdDaemon(daemon_base.DaemonBase):
13871380
# Get admin state of DPU
13881381
admin_state = self.module_updater.get_module_admin_status(module_name)
13891382
if admin_state == ModuleBase.MODULE_STATUS_EMPTY:
1390-
op = MODULE_PRE_SHUTDOWN
1391-
if operational_state != ModuleBase.MODULE_STATUS_OFFLINE:
1392-
# shutdown DPU
1393-
op = MODULE_ADMIN_DOWN
1383+
op = MODULE_ADMIN_DOWN
13941384

13951385
# Initialize DPU_STATE DB table on bootup
13961386
dpu_state_key = "DPU_STATE|" + module_name
@@ -1439,6 +1429,7 @@ class ChassisdDaemon(daemon_base.DaemonBase):
14391429
try:
14401430
# Start configuration manager task
14411431
if self.smartswitch:
1432+
self.set_initial_dpu_admin_state()
14421433
self.config_manager = SmartSwitchConfigManagerTask()
14431434
self.config_manager.task_run()
14441435
elif self.module_updater.supervisor_slot == self.module_updater.my_slot:
@@ -1450,10 +1441,6 @@ class ChassisdDaemon(daemon_base.DaemonBase):
14501441
# Start main loop
14511442
self.log_info("Start daemon main loop")
14521443

1453-
# Set the initial DPU admin state for SmartSwitch
1454-
if self.smartswitch:
1455-
self.set_initial_dpu_admin_state()
1456-
14571444
while not self.stop.wait(self.loop_interval):
14581445
self.module_updater.module_db_update()
14591446
self.module_updater.check_midplane_reachability()

sonic-chassisd/tests/test_chassis_db_init.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,3 +38,19 @@ def test_provision_db():
3838
assert serial == fvs[CHASSIS_INFO_SERIAL_FIELD]
3939
assert model == fvs[CHASSIS_INFO_MODEL_FIELD]
4040
assert revision == fvs[CHASSIS_INFO_REV_FIELD]
41+
42+
def test_try_get_timeout_error():
43+
def raise_timeout():
44+
raise TimeoutError("timeout")
45+
46+
result = try_get(raise_timeout)
47+
48+
assert result == NOT_AVAILABLE
49+
50+
def test_try_get_not_implemented_error():
51+
def raise_not_implemented():
52+
raise NotImplementedError("not implemented")
53+
54+
result = try_get(raise_not_implemented)
55+
56+
assert result == NOT_AVAILABLE

sonic-chassisd/tests/test_chassisd.py

Lines changed: 5 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1549,8 +1549,8 @@ def test_set_initial_dpu_admin_state_empty_offline():
15491549
# Verify DPU state was updated with 'down' since operational state is OFFLINE
15501550
mock_update_dpu_state.assert_called_once_with("DPU_STATE|DPU0", 'down')
15511551

1552-
# Verify callback was submitted with MODULE_PRE_SHUTDOWN since admin state is EMPTY and oper state is OFFLINE
1553-
mock_submit_callback.assert_called_once_with(0, MODULE_PRE_SHUTDOWN)
1552+
# Verify callback was submitted with MODULE_ADMIN_DOWN when admin state is EMPTY
1553+
mock_submit_callback.assert_called_once_with(0, MODULE_ADMIN_DOWN)
15541554

15551555

15561556
def test_set_initial_dpu_admin_state_empty_not_offline():
@@ -1600,7 +1600,7 @@ def test_set_initial_dpu_admin_state_empty_not_offline():
16001600
# Verify DPU state was updated with 'down' since operational state is not ONLINE
16011601
mock_update_dpu_state.assert_called_once_with("DPU_STATE|DPU0", 'down')
16021602

1603-
# Verify callback was submitted with MODULE_ADMIN_DOWN since admin state is EMPTY and oper state is not OFFLINE
1603+
# Verify callback was submitted with MODULE_ADMIN_DOWN when admin state is EMPTY
16041604
mock_submit_callback.assert_called_once_with(0, MODULE_ADMIN_DOWN)
16051605

16061606

@@ -2041,26 +2041,11 @@ def test_submit_dpu_callback():
20412041
daemon_chassisd.module_updater = module_updater
20422042
module_updater.module_table.get = MagicMock(return_value=(True, []))
20432043

2044-
# Test MODULE_ADMIN_DOWN scenario
2045-
with patch.object(module, 'module_pre_shutdown') as mock_pre_shutdown, \
2046-
patch.object(module, 'set_admin_state_gracefully') as mock_set_admin_state_gracefully:
2044+
# Test MODULE_ADMIN_DOWN scenario - set_admin_state_gracefully is called
2045+
with patch.object(module, 'set_admin_state_gracefully') as mock_set_admin_state_gracefully:
20472046
daemon_chassisd.submit_dpu_callback(index, MODULE_ADMIN_DOWN)
2048-
# Verify pre_shutdown is not called for admin down
2049-
mock_pre_shutdown.assert_not_called()
2050-
# Verify set_admin_state_gracefully is called with MODULE_ADMIN_DOWN
20512047
mock_set_admin_state_gracefully.assert_called_once_with(MODULE_ADMIN_DOWN)
20522048

2053-
# Test MODULE_PRE_SHUTDOWN scenario
2054-
with patch.object(module, 'module_pre_shutdown') as mock_pre_shutdown, \
2055-
patch.object(module, 'set_admin_state_gracefully') as mock_set_admin_state_gracefully:
2056-
2057-
module_updater.module_table.get = MagicMock(return_value=(True, []))
2058-
daemon_chassisd.submit_dpu_callback(index, MODULE_PRE_SHUTDOWN)
2059-
2060-
# Verify only pre_shutdown is called for pre-shutdown state
2061-
mock_pre_shutdown.assert_called_once()
2062-
mock_set_admin_state_gracefully.assert_not_called()
2063-
20642049
def test_chassis_daemon_assertion():
20652050
chassis = MockChassis()
20662051

0 commit comments

Comments
 (0)