Skip to content

Smart Counter Poll to allow counters to work properly on Broadcom platforms.#1755

Open
justin-wong-ce wants to merge 11 commits intosonic-net:202511from
justin-wong-ce:202511_counter_enhancement
Open

Smart Counter Poll to allow counters to work properly on Broadcom platforms.#1755
justin-wong-ce wants to merge 11 commits intosonic-net:202511from
justin-wong-ce:202511_counter_enhancement

Conversation

@justin-wong-ce
Copy link

Summary

This is change is to address #1753 and the 202511 implementation of the HLD added in sonic-net/SONiC#2190.

Existing 202511 assumes all ports support the same counter capabilities. This causes issues on most Broadcom platform switches as there are different types of ports on a switch that does not support the same set of counters.

Fix by dynamically discovering what each interface is capable of during initialization of syncd.

For more detials please refer to the above issue and HLD.

Testing

Testing is done on both a Arista-7060X6-16PE-384C-B-O128S2 and Arista-7260CX3-D108C8 on 202511 with the tests:

sonic-mgmt/tests/dhcp_relay/test_dhcp_counter_stress.py
sonic-mgmt/tests/drop_packets/test_drop_counters.py
sonic-mgmt/tests/drop_packets/test_configurable_drop_counters.py
sonic-mgmt/tests/gnmi/test_gnmi_countersdb.py
sonic-mgmt/tests/snmp/test_snmp_queue_counters.py

Due to ongoing warm reboot test issues in 202511 (sonic-net/sonic-swss#4108), warm reboot tests ( sonic-mgmt/tests/platform_tests/test_advanced_reboot.py) are instead conducted on 202505 using Arista-7260CX3-D108C8 with this change backported to it.

A full sonic-mgmt test suite run has also been ran. There are no notable fallout compared to sonic-mgmt runs without this change.

Performance Impact:

All logic change is only done in the counter initialisation stage of FlexCounters.cpp - there is no polling logic change at all.
Therefore, any performance impact is limited to any new execution of syncd - i.e. reboot / config reload / systemctl restart.

Tested on several topologies:
image

This impact seems reasonable for what this offers.

The fastest operation that would cause a syncd restart is a systemctl restart swss, that is a operation that spans minutes. The worst real life scenario on a HwSKU with older CPU and many interfaces takes additional ~17 seconds. New, high interface-count HwSKUs only takes an extra ~2 seconds.

As for memory usage - the impact is on the order of kilobytes, the impact is negligible as the system has GBs of RAM.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

justin-wong-ce and others added 3 commits February 17, 2026 18:22
Signed-off-by: Justin Wong <jvwong@arista.com>
Signed-off-by: Justin Wong <jvwong@arista.com>
Builds are failing due to tests failing a check where every function needs a `SWSS_LOG_ENTER();`. Adding it to the functions missing it.

Signed-off-by: Justin Wong <jvwong@arista.com>
@justin-wong-ce justin-wong-ce force-pushed the 202511_counter_enhancement branch from 6d9b8d0 to e7fdfd5 Compare February 17, 2026 18:22
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Fixed code logic that does not account for a change in supported
counters on an already initialized FlexCounter object. On an actual
device, counter support does not change on an initialized FlexCounter
object as it is hardware / SAI bound.

Fixed mock function not returning a SAI_STATUS_FAILURE when a counter
poll is supposed to fail.

Signed-off-by: Justin Wong <jvwong@arista.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@justin-wong-ce justin-wong-ce marked this pull request as draft February 24, 2026 21:52
@justin-wong-ce
Copy link
Author

WIP - adding unit test

Signed-off-by: Justin Wong <jvwong@arista.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

- reverted previous test mock function change with getStats as it was
  intended

- changed some ordering of test value checking as ordering of counters
  have changed

- added logic to handle removal of counters from counter groups

- added logic to cleanup stale counter groups

- added logic to account for behavior when different global flags are
  set, i.e. (use_sai_stats_capa_query, dont_clear_support_counter,
  always_check_supported_counters)

- moved large duplicated code blocks into helper functions

Signed-off-by: Justin Wong <jvwong@arista.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Justin Wong <jvwong@arista.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Justin Wong <jvwong@arista.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Justin Wong <jvwong@arista.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Change makeCounterGroupRef args to use size_t instead of uint64_t

Signed-off-by: Justin Wong <jvwong@arista.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@justin-wong-ce justin-wong-ce marked this pull request as ready for review February 26, 2026 23:09
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vmittal-msft
Copy link
Contributor

@justin-wong-ce can you please share master pr for the same ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants