Skip to content

[EVPN-MH] Add CLI commands for EVPN VXLAN Multihoming configuration#4247

Open
bdfriedman wants to merge 1 commit into
sonic-net:masterfrom
bdfriedman:evpn_mh
Open

[EVPN-MH] Add CLI commands for EVPN VXLAN Multihoming configuration#4247
bdfriedman wants to merge 1 commit into
sonic-net:masterfrom
bdfriedman:evpn_mh

Conversation

@bdfriedman
Copy link
Copy Markdown

@bdfriedman bdfriedman commented Feb 4, 2026

Why I did it

This PR adds CLI commands to configure and monitor EVPN VXLAN Multihoming (EVPN-MH) feature in SONiC. These commands enable network operators to:

  • Configure global EVPN-MH parameters (startup delay, MAC/neighbor hold times)
  • Configure EVPN Ethernet Segments on LAG/PortChannel interfaces
  • Set Designated Forwarder (DF) preference for load distribution
  • Display EVPN ES, ES-EVI, and L2 nexthop information

This complements the EVPN-MH infrastructure added in sonic-swss and sonic-swss-common.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Global EVPN-MH Configuration (evpn_mh.py):

  1. config evpn-mh startup-delay <0-3600> - Configure EVPN-MH startup delay (default: 300s)

    • Determines how long to wait before declaring local Ethernet Segments as up after system boot
    • Prevents premature traffic forwarding during initialization
    • Valid range: 0-3600 seconds
  2. config evpn-mh mac-holdtime <0-86400> - Configure MAC hold time (default: 1080s)

    • Duration to retain remote MAC entries after EVPN route withdrawal
    • Allows for graceful handling of temporary connectivity issues
    • Valid range: 0-86400 seconds
  3. config evpn-mh neigh-holdtime <0-86400> - Configure neighbor hold time (default: 1080s)

    • Duration to retain neighbor entries after EVPN route withdrawal
    • Valid range: 0-86400 seconds

Interface-level EVPN-ES Configuration (main.py):
4. config interface evpn-esi add <interface> <esi> - Add EVPN Ethernet Segment Identifier

  • Supports two ESI types:
    • Type 0 (Manual): 00:XX:XX:XX:XX:XX:XX:XX:XX:XX - Operator-configured 10-byte ESI
      • First byte must be 0x00 (Type 0)
      • Validates against reserved ESIs (all zeros, all FFs)
      • Prevents duplicate manual ESI across interfaces
    • Type 3 (MAC-based): auto-system-mac - Auto-generated from system MAC
      • Uses port ID derived from interface name
      • Leverages system_mac from PORTCHANNEL table if configured
  • Integrates with FRR via vtysh commands
  • Stores configuration in CONFIG_DB EVPN_ETHERNET_SEGMENT table
  1. config interface evpn-esi del <interface> - Remove EVPN Ethernet Segment configuration

    • Removes from CONFIG_DB
    • Cleans up FRR configuration (es-id, es-sys-mac, es-df-pref)
  2. config interface evpn-df-pref <interface> <1-65535> - Set DF preference

    • Controls Designated Forwarder election priority (higher value = higher preference)
    • Default: 32767
    • Valid range: 1-65535
    • Updates both CONFIG_DB and FRR configuration

Show Commands (evpn.py):
7. show evpn - Display general EVPN information
8. show evpn es [<esi>] - Display Ethernet Segment information

  • Optional ESI parameter to show specific ES details
  1. show evpn es-evi [<vni>] - Display Ethernet Segment per EVI information
    • Optional VNI parameter to filter by specific VLAN/VNI
  2. show evpn es-evi detail - Show detailed ES-EVI information
  3. show evpn l2-nh - Display L2 nexthop groups for EVPN all-active ES

All show commands leverage FRR's show evpn commands via bgp_util.run_bgp_show_command().

Testing (210+ test cases):

  • config_evpn_mh_test.py - Global EVPN-MH configuration tests

    • Startup delay validation (boundary values, valid/invalid inputs)
    • MAC holdtime validation
    • Neighbor holdtime validation
  • config_int_evpn_test.py - Interface EVPN-ES configuration tests

    • Manual ESI configuration (Type 0)
    • MAC-based ESI configuration (Type 3)
    • ESI validation (reserved ESI rejection, format validation, duplicate detection)
    • DF preference configuration
    • Configuration add/delete operations
    • Error handling and edge cases

Helper Functions:

  • is_reserved_esi() - Validates against reserved ESI values
  • parse_esi_input() - Parses and validates ESI input (Type 0 vs Type 3)
  • port_id_from_if_name() - Extracts numeric port ID from interface name
  • check_if_same_manual_esi_exists() - Prevents duplicate manual ESI configuration
  • run_vtysh_command() - Wraps vtysh execution for FRR integration

Files Changed:

  • evpn_mh.py - New file (109 lines)
  • main.py - Added 199 lines for interface EVPN commands
  • evpn.py - New file (67 lines)
  • main.py - Added 2 lines to register EVPN command group
  • config_evpn_mh_test.py - New file (104 lines)
  • config_int_evpn_test.py - New file (230 lines)

Total: +711 lines across 6 files

How to verify it

Global EVPN-MH Configuration:

# Configure global EVPN-MH parameters
config evpn-mh startup-delay 600
config evpn-mh mac-holdtime 2000
config evpn-mh neigh-holdtime 2000

# Verify in CONFIG_DB
redis-cli -n 4 HGETALL "EVPN_MH_GLOBAL|default"

Interface EVPN-ES Configuration:

# Configure Type 0 (manual) ESI on PortChannel
config interface evpn-esi add PortChannel1 00:01:02:03:04:05:06:07:08:09

# Configure Type 3 (MAC-based) ESI
config interface evpn-esi add PortChannel2 auto-system-mac

# Set DF preference
config interface evpn-df-pref PortChannel1 50000

# Verify in CONFIG_DB
redis-cli -n 4 HGETALL "EVPN_ETHERNET_SEGMENT|PortChannel1"

# Verify in FRR
vtysh -c "show running-config" | grep -A 5 "interface PortChannel1"

# Delete ESI
config interface evpn-esi del PortChannel1

Show Commands:

# Display EVPN information
show evpn
show evpn es
show evpn es 00:01:02:03:04:05:06:07:08:09
show evpn es-evi
show evpn es-evi 1000
show evpn es-evi detail
show evpn l2-nh

Run Unit Tests:

pytest tests/config_evpn_mh_test.py -v
pytest tests/config_int_evpn_test.py -v

Which release branch to backport (provide reason below if selected)

  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Tested branch (Please provide the tested image version)

Description for the changelog

Add CLI commands for EVPN VXLAN Multihoming: global EVPN-MH configuration, interface Ethernet Segment (ESI) configuration, DF preference settings, and EVPN show commands

Link to config_db schema for YANG model changes

Updates required for:

  • New table: EVPN_MH_GLOBAL - Global EVPN-MH configuration

    • startup_delay: 0-3600 (default: 300)
    • mac_holdtime: 0-86400 (default: 1080)
    • neigh_holdtime: 0-86400 (default: 1080)
  • New table: EVPN_ETHERNET_SEGMENT - Per-interface ES configuration

    • esi: 10-byte ESI string (e.g., "00:01:02:03:04:05:06:07:08:09") or "AUTO"
    • type: "TYPE_0_OPERATOR_CONFIGURED" or "TYPE_3_MAC_BASED"
    • df_pref: 1-65535 (default: 32767)

YANG model updates should be submitted to sonic-yang-models repository.

Depends on

  • sonic-swss-common PR: [Add L2 nexthop group table and raw netlink message handling support]
  • sonic-swss PR: [EVPN-MH infrastructure and orchestration agents]
  • FRR: EVPN MH support (available in FRR 8.x+)

Key Features:
✅ Global EVPN-MH parameter configuration
✅ Type 0 (manual) and Type 3 (MAC-based) ESI support
✅ DF preference configuration for load balancing
✅ FRR integration via vtysh
✅ CONFIG_DB persistence
✅ Comprehensive validation and error handling
✅ 210+ unit test cases
✅ Show commands for monitoring EVPN state

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Feb 4, 2026

CLA Signed
The committers listed above are authorized under a signed CLA.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Comment thread show/evpn.py Outdated
cmd += " {}".format(es_evi)

output = bgp_util.run_bgp_show_command(cmd)
print(output)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it consistent to use click.echo instead of print.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: now uses click.echo instead of print throughout show/evpn.py.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified: all print() calls replaced with click.echo() in show/evpn.py. Fix looks good.

Comment thread show/evpn.py Outdated
"""Show evpn Ethernet Segment per EVI information"""
cmd = "show evpn es-evi"

if es_evi is not None:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be just

    if es_evi:  # This checks if es_evi is not None AND not an empty string
        cmd += " {}".format(es_evi)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: now uses truthy check if vni: and if es: instead of is not None.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified: truthy checks (if es:, if vni:) used instead of is not None. Looks good.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@bdfriedman bdfriedman changed the title Add EVPN command support and enhance VXLAN remote VTEP handling [EVPN-MH] Add CLI commands for EVPN VXLAN Multihoming configuration Feb 25, 2026
@lguohan lguohan requested a review from Copilot March 11, 2026 17:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds SONiC CLI support for configuring and monitoring EVPN VXLAN Multihoming (EVPN-MH), including new config evpn-mh global settings, config interface EVPN-ES/DF preference commands, and a new show evpn command group that proxies FRR output.

Changes:

  • Add config evpn-mh commands to configure global EVPN-MH timers in CONFIG_DB.
  • Add config interface evpn-esi ... and config interface evpn-df-pref ... to configure Ethernet Segment parameters and DF preference, including FRR vtysh integration.
  • Add show evpn ... commands that run FRR show evpn variants and display the output.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
config/main.py Adds EVPN ES interface configuration commands, parsing/validation helpers, and FRR vtysh integration.
config/evpn_mh.py Adds new config evpn-mh command group for EVPN-MH global timer configuration.
show/main.py Registers the new show evpn command group.
show/evpn.py Implements show evpn subcommands by proxying FRR show evpn ... output.
tests/config_evpn_mh_test.py Adds unit tests for EVPN-MH global configuration commands.
tests/config_int_evpn_test.py Adds unit tests for EVPN ES interface configuration and DF preference behavior.

Comment thread config/evpn_mh.py Outdated
Comment on lines +39 to +42
def set_startup_delay(ctx, startup_delay=EVPN_MH_STARTUP_DELAY_DEFAULT):
"""Add EVPN MH"""
config_db = ValidatedConfigDBConnector(ctx.obj['config_db'])
if not is_valid_startup_delay(startup_delay):
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These handlers access ctx.obj['config_db'], but when invoked from the top-level config group ctx.obj is a utilities_common.db.Db instance (see config/main.py), not a dict. This will raise TypeError: 'Db' object is not subscriptable at runtime (even though unit tests pass a dict). Update the commands to use @clicommon.pass_db and db.cfgdb (or access ctx.obj.cfgdb) consistently across all evpn-mh subcommands.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: commands now use @clicommon.pass_db decorator and access db.cfgdb instead of ctx.obj['config_db'].

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified: config/evpn_mh.py commands now use @clicommon.pass_db decorator and access db.cfgdb. Tests updated to pass Db() instance. Fix looks correct.

Comment thread config/main.py Outdated
Comment on lines +5621 to +5623
def run_vtysh_command(cmd):
return clicommon.run_command(cmd)

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_vtysh_command() wraps clicommon.run_command(), which returns None on success and calls sys.exit() on failures. As a result, the subsequent if op: checks will never catch errors, and failures can abruptly exit without a Click-friendly error message. Consider using return_cmd=True (and ignore_error=True) to capture output/rc and then ctx.fail() on non-zero rc.

Suggested change
def run_vtysh_command(cmd):
return clicommon.run_command(cmd)
def run_vtysh_command(cmd, ctx=None):
"""
Run a vtysh command.
If a Click context is provided, capture the return code/output and
surface failures via ctx.fail() instead of exiting abruptly.
When no context is provided, preserve the original behavior of
clicommon.run_command(), which may call sys.exit() on error.
"""
if ctx is None:
# Preserve existing behavior for callers that don't use Click ctx
return clicommon.run_command(cmd)
rc, output = clicommon.run_command(cmd, return_cmd=True, ignore_error=True)
if rc != 0:
message = output if output else f"Command '{cmd}' failed with return code {rc}"
ctx.fail(message)
return output

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: run_vtysh_command() now accepts an optional ctx parameter, captures rc/output with return_cmd=True, and calls ctx.fail() on non-zero rc.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified: run_vtysh_command() now accepts optional ctx, uses return_cmd=True and ignore_error=True, and calls ctx.fail() on non-zero rc. All callers updated.

Comment thread show/evpn.py Outdated
Comment on lines +45 to +46
"""Show Ethernet Segment per EVI information"""
""""show evpn es-evi <vni>"""
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a stray standalone string literal (""""show evpn es-evi <vni>""") under es_evi(). It isn’t a docstring/comment (because the function already has a docstring) and is easy to misread; remove it or merge it into the actual docstring.

Suggested change
"""Show Ethernet Segment per EVI information"""
""""show evpn es-evi <vni>"""
"""Show Ethernet Segment per EVI information (show evpn es-evi <vni>)"""

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: stray standalone string literal removed and merged into the actual docstring.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified: stray string literal removed, merged into the docstring. Looks good.

Comment thread config/main.py
Comment on lines +5624 to +5628
def check_if_same_manual_esi_exists(ctx, esi_args, es_data):
if esi_args['type'] == 'TYPE_0_OPERATOR_CONFIGURED':
for es_intf_name, es_intf_data in es_data.items():
if esi_args['esi'] == es_intf_data['esi']:
ctx.fail(f"The ESI '{esi_args['esi']}' is already in use by '{es_intf_name}'")
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_if_same_manual_esi_exists() compares ESI strings with exact case. Since manual ESIs are hex, the same ESI can be configured twice using different casing (e.g., ...:ff vs ...:FF), bypassing the duplicate check. Normalize ESIs (e.g., lower-case) before storing and when comparing to enforce true uniqueness.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: ESI strings are now normalized to lowercase before storing (esi_args['esi'] = esi_input_strs[0].lower()) and compared case-insensitively.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified: ESI strings normalized to lowercase before storing and compared case-insensitively. Fix looks correct.

Comment thread config/main.py Outdated
Comment on lines +5612 to +5619
port_id_re = re.compile(r'[a-zA-Z]+(?P<port_id>[0-9_]+)')
port_id = port_id_re.search(if_name)
port_id_group = port_id.group('port_id')

if port_id_group:
port_id = port_id_group.replace('_', '')

return port_id
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

port_id_from_if_name() assumes the regex always matches and immediately calls .group('port_id'). If a user provides an interface name without digits (or an unexpected format), this will raise an AttributeError and crash the CLI. Handle a non-match explicitly (e.g., return None/raise a ValueError) and have the caller ctx.fail() with a clear message.

Suggested change
port_id_re = re.compile(r'[a-zA-Z]+(?P<port_id>[0-9_]+)')
port_id = port_id_re.search(if_name)
port_id_group = port_id.group('port_id')
if port_id_group:
port_id = port_id_group.replace('_', '')
return port_id
"""Extract the numeric port identifier from an interface name.
Returns a string of digits (with any underscores removed) if the
interface name matches the expected pattern, otherwise returns None.
"""
port_id_re = re.compile(r'[a-zA-Z]+(?P<port_id>[0-9_]+)')
match = port_id_re.search(if_name)
if not match:
return None
port_id_group = match.group('port_id')
if port_id_group:
return port_id_group.replace('_', '')
return None

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: port_id_from_if_name() now returns None on non-match, and callers check for None with ctx.fail().

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified: port_id_from_if_name() now returns None on non-match, and callers check for None with ctx.fail(). Looks good.

Comment thread config/main.py Outdated
Comment on lines +5742 to +5743
ctx.fail(f"EVPN Ethernet Segment {interface_name} - DF Preference {df_pref} is not valid. " \
"Valid values are {EVPN_ES_DF_PREF_MIN}-{EVPN_ES_DF_PREF_MAX}.")
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation error message for an invalid DF preference is missing the f prefix on the second line, so {EVPN_ES_DF_PREF_MIN} / {EVPN_ES_DF_PREF_MAX} will be printed literally. Make the entire message an f-string (or build it in one f-string) so the numeric range is displayed correctly.

Suggested change
ctx.fail(f"EVPN Ethernet Segment {interface_name} - DF Preference {df_pref} is not valid. " \
"Valid values are {EVPN_ES_DF_PREF_MIN}-{EVPN_ES_DF_PREF_MAX}.")
ctx.fail(
f"EVPN Ethernet Segment {interface_name} - DF Preference {df_pref} is not valid. "
f"Valid values are {EVPN_ES_DF_PREF_MIN}-{EVPN_ES_DF_PREF_MAX}."
)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: the second line now has the f prefix so {EVPN_ES_DF_PREF_MIN}/{EVPN_ES_DF_PREF_MAX} are interpolated correctly.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified: f-string prefix added to the second line so {EVPN_ES_DF_PREF_MIN}/{EVPN_ES_DF_PREF_MAX} are interpolated correctly. Fix looks good.

Comment thread config/main.py
Comment on lines +5705 to +5719
try:
config_db.set_entry(EVPN_ES_TABLE, interface_name, None)

# Update frr
cmd = ['sudo', 'vtysh', '-c', 'configure terminal', '-c', 'interface {}'.format(interface_name)]
cmd.append('-c')
cmd.append('no evpn mh es-sys-mac')
cmd.append('-c')
cmd.append('no evpn mh es-df-pref')
cmd.append('-c')
cmd.append('no evpn mh es-id')
op = run_vtysh_command(cmd)
if op:
ctx.fail("VTYSh config failed. Error: {}".format(op))
except JsonPatchConflict as e:
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EVPN ES entry is deleted from CONFIG_DB before the vtysh cleanup is executed. If vtysh fails, FRR may keep stale configuration while CONFIG_DB indicates it was removed. Consider doing FRR cleanup first or re-adding the CONFIG_DB entry on failure so the system remains consistent.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: FRR cleanup via vtysh is now performed before deleting from CONFIG_DB.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified: FRR cleanup via vtysh now performed before CONFIG_DB delete in del_evpn_es. Fix looks correct.

Comment thread config/main.py Outdated
Comment on lines +5745 to +5754
try:
config_db.mod_entry(EVPN_ES_TABLE, interface_name, {'df_pref': int(df_pref)})

# Update frr
cmd = ['sudo', 'vtysh', '-c', 'configure terminal', '-c', 'interface {}'.format(interface_name)]
cmd.append('-c')
cmd.append('evpn mh es-df-pref {}'.format(int(df_pref)))
op = run_vtysh_command(cmd)
if op:
ctx.fail("VTYSh config failed. Error: {}".format(op))
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mod_entry() is applied before the vtysh update. If the FRR update fails, CONFIG_DB will reflect the new DF preference while FRR does not. Consider applying vtysh first or rolling back the CONFIG_DB update when the vtysh call fails.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: vtysh update is now applied before mod_entry, and df_pref is stored as str(df_pref) instead of int(df_pref).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified: vtysh update applied before mod_entry, and df_pref stored as str(). Fix looks correct.

Comment thread show/evpn.py Outdated
Comment on lines +12 to +14
from natsort import natsorted
from swsscommon.swsscommon import SonicV2Connector, ConfigDBConnector
from tabulate import tabulate
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show/evpn.py imports several modules that are never used in this file (natsorted, SonicV2Connector, ConfigDBConnector, tabulate). Remove unused imports to avoid confusion and keep dependencies minimal.

Suggested change
from natsort import natsorted
from swsscommon.swsscommon import SonicV2Connector, ConfigDBConnector
from tabulate import tabulate

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: unused imports (natsorted, SonicV2Connector, ConfigDBConnector, tabulate) removed. re import added for validation.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified: unused imports removed (natsorted, SonicV2Connector, ConfigDBConnector, tabulate), re import added. Looks good.

Comment thread tests/config_int_evpn_test.py Outdated
evpn_es_table = db['config_db'].get_table(EVPN_ES_TABLE)
assert interface_name in evpn_es_table, f"'{interface_name}' not found in {EVPN_ES_TABLE}"
assert evpn_es_table[interface_name]['esi'] == 'AUTO', f"Got ESI {evpn_es_table[interface_name]['esi']}, expected 'AUTO'"
assert evpn_es_table[interface_name]['type'] == 'TYPE_3_MAC_BASED', f"Got ESI type {evpn_es_table[interface_name]['esi']}, expected 'TYPE_3_MAC_BASED'"
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In configure_mac_esi(), the assertion message says “Got ESI type …” but formats evpn_es_table[interface_name]['esi'] instead of the 'type' field. This makes failures misleading; use the 'type' value in the error message.

Suggested change
assert evpn_es_table[interface_name]['type'] == 'TYPE_3_MAC_BASED', f"Got ESI type {evpn_es_table[interface_name]['esi']}, expected 'TYPE_3_MAC_BASED'"
assert evpn_es_table[interface_name]['type'] == 'TYPE_3_MAC_BASED', f"Got ESI type {evpn_es_table[interface_name]['type']}, expected 'TYPE_3_MAC_BASED'"

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: assertion message now correctly references evpn_es_table[interface_name]['type'] instead of ['esi'].

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified: assertion message now references ["type"] instead of ["esi"]. Fix looks correct.

Comment thread config/evpn_mh.py Outdated
@click.argument('startup_delay', metavar='<startup_delay>', required=True)
@click.pass_context
def set_startup_delay(ctx, startup_delay=EVPN_MH_STARTUP_DELAY_DEFAULT):
"""Add EVPN MH"""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps using "Set EVPN MH startup delay time in seconds" would be better.
set_startup_delay、set_mac_holdtime and set_neigh_holdtime are all "Add EVPN MH".

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: docstrings updated to be more descriptive - 'Set EVPN MH startup delay time in seconds', 'Set EVPN MH MAC holdtime in seconds', 'Set EVPN MH neighbor holdtime in seconds'.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified: docstrings updated to descriptive text. Looks good.

Comment thread config/evpn_mh.py Outdated
f"Valid values are {EVPN_MH_STARTUP_DELAY_MIN}-{EVPN_MH_STARTUP_DELAY_MAX}.")

try:
config_db.set_entry(EVPN_MH_TABLE, 'default', {'startup_delay': startup_delay})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "set_entry" will overwrite all existing fields, so other fields under 'default' will be cleared. e.g. mac_holdtime
Recommanded: get old entry, update only the needed ones, and then set the entry to avoid lossing existing data.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: now does get_entry first to preserve existing fields, then updates only the needed field, then set_entry.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified: set_startup_delay now does get_entry first to preserve existing fields, then updates only the needed field. Fix looks correct.

Comment thread config/evpn_mh.py Outdated
f"Valid values are {EVPN_MH_MAC_HOLDTIME_MIN}-{EVPN_MH_MAC_HOLDTIME_MAX}.")

try:
config_db.set_entry(EVPN_MH_TABLE, 'default', {'mac_holdtime': mac_holdtime})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "set_entry" will overwrite all existing fields, so other fields under 'default' will be cleared.
Recommanded: get old entry, update only the needed ones, and then set the entry to avoid lossing existing data.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: same pattern applied - get existing entry, update field, set entry.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified: same get-then-update pattern applied to set_mac_holdtime. Fix looks correct.

Comment thread config/evpn_mh.py Outdated
f"Valid values are {EVPN_MH_NEIGH_HOLDTIME_MIN}-{EVPN_MH_NEIGH_HOLDTIME_MAX}.")

try:
config_db.set_entry(EVPN_MH_TABLE, 'default', {'neigh_holdtime': neigh_holdtime})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "set_entry" will overwrite all existing fields, so other fields under 'default' will be cleared. e.g. mac_holdtime
Recommanded: get old entry, update only the needed ones, and then set the entry to avoid lossing existing data.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: same pattern applied - get existing entry, update field, set entry.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified: same get-then-update pattern applied to set_neigh_holdtime. Fix looks correct.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

Copy link
Copy Markdown

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • All prior review concerns addressed in new commits — threads resolved
  • FRR-before-ConfigDB ordering, input validation, ESI case normalization, dead code removal all look good
  • es_evi detail subcommand remains unreachable due to positional vni arg conflicting with subcommand routing (existing thread left open)
  • New VNI/ESI input validation is a solid improvement over passing raw strings to vtysh
  • No new issues found in the delta diff

@banidoru
Copy link
Copy Markdown

The concern about the unreachable 'detail' subcommand is still valid. The 'es_evi' group's use of 'invoke_without_command=True' with a positional 'vni' argument prevents the 'detail' subcommand from being called.

Copy link
Copy Markdown

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Most previous comments have been addressed.
  • The subcommand remains unreachable.
  • Unable to resolve threads due to permissions.

Copy link
Copy Markdown

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review of commit 68af379. All 30+ prior review threads addressed. Key improvements:

  • FRR-before-CONFIG_DB ordering consistently applied across all commands
  • run_vtysh_command properly refactored with ctx-based error handling
  • Input validation added (ESI format, VNI range, df_pref type safety)
  • Test fixtures updated for new @clicommon.pass_db pattern
  • Bug fixes: del_mac NameError, bare excepts, shadowed names, typos

One minor nit below. Overall looking solid.

Comment thread config/main.py
if evpn_es_tbl and 'type' in evpn_es_tbl and evpn_es_tbl['type'] == 'TYPE_3_MAC_BASED':
cmd.append('-c')
cmd.append('no evpn mh es-sys-mac')
run_vtysh_command(cmd, ctx)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: run_vtysh_command(cmd, ctx) is now called unconditionally, but the only meaningful vtysh sub-commands (no evpn mh es-sys-mac) are only appended when the ES entry is TYPE_3_MAC_BASED. When it's not, this makes a no-op sudo vtysh -c 'configure terminal' -c 'interface ...' call. In the previous code, vtysh was only invoked inside the if evpn_es_tbl... block for the delete path. Consider guarding the vtysh call similarly to avoid unnecessary sudo invocations:

if evpn_es_tbl and 'type' in evpn_es_tbl and evpn_es_tbl['type'] == 'TYPE_3_MAC_BASED':
    cmd.append('-c')
    cmd.append('no evpn mh es-sys-mac')
    run_vtysh_command(cmd, ctx)

Not a functional issue, just unnecessary overhead.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still open — not addressed in the latest commit. Low priority nit, can be deferred.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

Copy link
Copy Markdown

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Good fix: corrected run_command() return value destructuring from rc, output to output, rc — this was a real bug that would have swapped error code and output string
  • All 30 prior review threads have been addressed with appropriate fixes
  • Still open (minor): no-op vtysh call in del_evpn_es for non-MAC-based ES types — unnecessary sudo invocation (nit, not a blocker)
  • Still open (design): show evpn es-evi detail subcommand remains unreachable due to positional vni arg consuming "detail" — tracked separately
  • No new issues found in this diff. LGTM.

Copy link
Copy Markdown

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • New commit fixes critical bug: run_vtysh_command had swapped return values (rc, outputoutput, rc), matching clicommon.run_command's actual signature. Good catch.
  • All 30 prior review threads addressed — fixes verified in previous iteration. Unable to resolve threads due to permissions.
  • One open nit remains: unnecessary vtysh invocation in del_evpn_es / del_pc_sys_id_mac when ES type is not TYPE_3_MAC_BASED (low priority).
  • No new issues found in this single-line diff.

Comment thread config/main.py
if ctx is None:
# Preserve existing behavior for callers that don't use Click ctx
return clicommon.run_command(cmd)
output, rc = clicommon.run_command(cmd, return_cmd=True, ignore_error=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix — clicommon.run_command() returns (output, returncode), not (returncode, output). The previous code would have treated the output string as rc (always truthy → always failing) and the integer return code as the error message. This was a runtime-breaking bug for any vtysh failure path.

Copy link
Copy Markdown

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All reviewers approved. LGTM.

@tahmed-dev
Copy link
Copy Markdown
Contributor

The FRR-first-then-ConfigDB pattern is good for atomicity, but if the process crashes between the vtysh call and config_db.mod_entry(), FRR has the config but ConfigDB doesn't. On next boot, ConfigDB is authoritative and FRR won't have the sys-mac. Is there a reconciliation mechanism, or should ConfigDB be the source of truth with teammgr/frrcfgd pushing to FRR?

@tahmed-dev
Copy link
Copy Markdown
Contributor

port_id_from_if_name extracts digits from the interface name (e.g., PortChannel0"0"). This becomes the es-id in FRR. If two interfaces have overlapping numeric suffixes across different naming conventions (e.g., PortChannel10 and Ethernet10), could this collide? Is the intent that ES is only supported on PortChannels?

@tahmed-dev
Copy link
Copy Markdown
Contributor

The L2 NHG resolution in fdbshow does db.keys() + loop over all NEXT_HOP_GROUP_MEMBER entries to find members of a specific group. On a switch with many NHGs (L3 ECMP + L2 MH), this could be slow. Would it help to filter by the group OID first, or is this acceptable for a show command?

@tahmed-dev
Copy link
Copy Markdown
Contributor

int(startup_delay) in range(0, 3601) creates a range object on every call. Minor, but for consistency with other SONiC CLI validators, could use 0 <= int(startup_delay) <= 3600 which is more readable and doesn't allocate.

@radha-danda
Copy link
Copy Markdown

@bdfriedman, Please address comments, CI checks

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

tahmed-dev added a commit to bdfriedman/sonic-utilities that referenced this pull request May 26, 2026
Rebased on top of sonic-net/sonic-utilities@f66ad703 to

resolve merge conflict in doc/Command-Reference.md (show vlan brief).

Squashes the original PR sonic-net#4247 commits into a single linear commit.

Preserves both the new Static Anycast Gateway field description from the PR

and the multi-ASIC namespace note added on master.

Also fixes PR-introduced trailing whitespace and the 'show static-anycast_gateway'

typo in doc/Command-Reference.md.

Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Rebased on top of sonic-net/sonic-utilities@f66ad703 to

resolve merge conflict in doc/Command-Reference.md (show vlan brief).

Squashes the original PR sonic-net#4247 commits into a single linear commit.

Preserves both the new Static Anycast Gateway field description from the PR

and the multi-ASIC namespace note added on master.

Also fixes PR-introduced trailing whitespace and the 'show static-anycast_gateway'

typo in doc/Command-Reference.md.

Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Comment thread config/evpn_mh.py
entry = config_db.get_entry(EVPN_MH_TABLE, 'default') or {}
entry['startup_delay'] = startup_delay
config_db.set_entry(EVPN_MH_TABLE, 'default', entry)
except ValueError as e:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception handling is incomplete.
This handler catches only ValueError, but when ValidatedConfigDBConnector runs with YANG validation enabled, set_entry() can also raise patch-related exceptions such as JsonPatchConflict. In that case, the CLI may expose an unhandled internal error instead of returning a clean ctx.fail(...) message.

Comment thread config/evpn_mh.py
entry = config_db.get_entry(EVPN_MH_TABLE, 'default') or {}
entry['mac_holdtime'] = mac_holdtime
config_db.set_entry(EVPN_MH_TABLE, 'default', entry)
except ValueError as e:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as set_startup_delay, exception handling is incomplete.

Comment thread config/evpn_mh.py
entry = config_db.get_entry(EVPN_MH_TABLE, 'default') or {}
entry['neigh_holdtime'] = neigh_holdtime
config_db.set_entry(EVPN_MH_TABLE, 'default', entry)
except ValueError as e:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as set_startup_delay, exception handling is incomplete.

Comment thread config/evpn_mh.py
f"Valid values are {EVPN_MH_STARTUP_DELAY_MIN}-{EVPN_MH_STARTUP_DELAY_MAX}.")

try:
# Get existing entry to preserve other fields
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A regression test is missing for field preservation.
the current tests only cover the empty-table case and do not verify that the other existing fields are preserved after the update.

Comment thread tests/interfaces_test.py
assert "Error: Invalid interface name Ethernet3" in result.output

@pytest.mark.skip(reason="Test is not working, needs investigation")
def test_add_remove_sys_mac_portchannel(self):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage is incomplete. The key failure paths are not covered, especially FRR command failure, CONFIG_DB write failure, and the TYPE_3_MAC_BASED branch.

Comment thread config/main.py
@click.pass_context
@click.argument('interface_name', metavar='<interface_name>', required=True)
@click.argument('esi_type', metavar='<esi_type>', required=True,
type=str, nargs=-1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accepts extra arguments because nargs=-1 is used, but only the first token is actually parsed.
However, the del command instead has a test for "error due to extra parameters". --- test_del_extra_arg(self, cli_db_connection):

Comment thread config/main.py
if sag_entry:
mac_address = sag_entry.get('gateway_mac', 'unknown')
log.log_info(f"'static-anycast-gateway mac_address del {mac_address}' executing...")
db.cfgdb.mod_entry('SAG', 'GLOBAL', None)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removes the whole SAG|GLOBAL entry, not just the gateway_mac field, which is risky if more SAG global fields are added later.

Comment thread config/vlan.py
if vlan_entry and vlan_entry.get('static_anycast_gateway') != 'true':
ctx.fail("static-anycast-gateway is already disabled")

db.cfgdb.mod_entry('VLAN_INTERFACE', vlan, {"static_anycast_gateway": "false"})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The disable logic is not fully consistent: an unset SAG state is treated differently from an explicitly disabled state.
A better fix is to define a single clear rule: only 'true' means enabled, and everything else means disabled.

vlan_entry = db.cfgdb.get_entry('VLAN_INTERFACE', vlan) or {}
if vlan_entry.get('static_anycast_gateway') != 'true':
    ctx.fail("static-anycast-gateway is already disabled")

db.cfgdb.mod_entry('VLAN_INTERFACE', vlan, {"static_anycast_gateway": "false"})

Comment thread show/evpn.py
# Validate ESI format (XX:XX:XX:XX:XX:XX:XX:XX:XX:XX - 10 hex bytes separated by colons)
esi_pattern = r'^[0-9a-fA-F]{2}(:[0-9a-fA-F]{2}){9}$'
if not re.match(esi_pattern, es):
click.echo(f"Error: Invalid ESI format '{es}'. Expected format: XX:XX:XX:XX:XX:XX:XX:XX:XX:XX")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid ESI/VNI input currently only prints an error and returns, so the command still exits with code 0.
It would be better to use ctx.fail(...) to make "illegal input" truly result in failure.

Comment thread show/main.py
body = []

sag_entry = db.cfgdb.get_entry('SAG', 'GLOBAL')
if sag_entry:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inconsistent configuration is hidden: if VLAN interfaces have static_anycast_gateway=true but SAG|GLOBAL is missing, the command just prints an empty table instead of showing a clear problem.

The better fix is to make the command state-aware instead of silently printing an empty table.

enabled_vlans = [...]
sag_mac = sag_entry.get('gateway_mac') if sag_entry else None

if not sag_mac and not enabled_vlans:
    # not configured
elif not sag_mac and enabled_vlans:
    # inconsistent
else:
    # normal display

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants