Skip to content

feat(config): typed delete-with-cascade for aliased entries (providers, agents, channels) #7175

@singlerider

Description

@singlerider

Background

V3 config is built on typed aliased entries that reference each other by
<type>.<alias> refs (define_provider_ref!-generated newtypes):

  • ModelProviderRefproviders.models.<family>.<alias>
  • TtsProviderRefproviders.tts.*
  • TranscriptionProviderRefproviders.transcription.*
  • ChannelRefchannels.<type>.<alias>
  • AgentAliasagents.<alias>

Config::validate() already fails loud on dangling refs at load, and (as of the
provider-fallback work) Config::collect_warnings() surfaces dangling/cyclic
providers.models.*.fallback refs as non-fatal warnings. What does NOT exist is
a delete path that removes an aliased entry AND scrubs every reference to it
in one operation. ModelProviders::remove_alias exists but (a) has no production
caller and (b) does not cascade. Deleting an alias today therefore leaves
dangling refs that only surface later as validation errors/warnings — the
operator has to hunt down and hand-edit every referrer.

What needs to happen

A canonical, surface-agnostic "delete this aliased entry and fix up all
referrers" operation, callable identically from TUI, web dashboard, and RPC
(no per-surface reimplementation — canonical-registry rule).

Provider aliases (simpler)

Deleting providers.models.<family>.<alias> must:

  1. Remove the entry.
  2. Scrub the alias from every other provider entry's fallback vector.
  3. For any AGENT that references the deleted alias via model_provider or
    classifier_provider: this is a HARD reference (an agent's model_provider
    is mandatory). Deletion here must be refused, or require the operator to
    first repoint/clear the referring agent. Decide policy (see Open Questions).

TTS / transcription provider deletes follow the same shape against
tts_provider / transcription_provider (those are opt-out/empty-allowed, so
scrubbing to empty is safe).

Agent aliases (more complex)

An agent is referenced from many places; deleting agents.<alias> must account
for ALL of them (inventory cross-checked against source 2026-06-04):

  • peer_groups.<group>.agents[] — remove the member (and prune the group if it
    drops below the 2-member minimum?).
  • agents.<other>.workspace.access (BTreeMap keyed by AgentAlias) — remove key.
  • agents.<other>.workspace.read_memory_from[] — remove entry.
  • heartbeat.agent — clear / refuse if it names the deleted agent.
  • Channel ownership: any channel routed to the deleted agent
    (agent_for_channel) — repoint or refuse.
  • Delegate targets / aliased-agent routes that name the deleted agent.

Because some of these are HARD refs (heartbeat target, channel owner) and some
are soft (peer-group membership, read_memory_from allowlist), agent deletion
needs a clear refuse-vs-scrub policy per referrer class, not a blanket scrub.

Why one shared implementation

The cascade logic must live in zeroclaw-config (the canonical registry), with
a single entry point the surfaces call. A find_all_references(alias)
Vec<RefSite> pass plus a delete_with_cascade(kind, alias, policy) that the
TUI/web/RPC delete actions all route through. Surfaces render the impact
(which referrers will be scrubbed / which block the delete) and confirm; they
do not each re-walk the schema.

Open questions

  1. Hard-ref policy: refuse-the-delete vs require-explicit-repoint vs
    delete-and-leave-dangling-with-warning. (Leaning: refuse, with the impact
    report telling the operator exactly what to fix first.)
  2. Peer-group minimum: when a delete drops a group below 2 members, prune the
    group, refuse, or leave it (validation already flags under-min groups?).
  3. Should the impact report be a dry-run API the surfaces call before
    confirming the destructive action?

Validation

  • Deleting a provider alias scrubs it from all fallback lists; config still
    validates clean afterward.
  • Deleting an agent removes it from peer groups, workspace access, and
    read_memory_from; hard refs (heartbeat/channel owner) are handled per policy.
  • A delete that would leave a hard dangling ref is reported (and refused per
    policy) rather than silently producing an invalid config.
  • All three surfaces (TUI/web/RPC) exercise the SAME shared entry point.

Related

  • Provider-fallback work (branch feat/provider-alias-fallback) added the
    per-alias fallback field + dangling/cycle warnings + runtime warn-and-skip.
    This issue is the deferred deletion-cascade half of that work
    (the warn-and-skip net already prevents crashes; this is the cleanup story).
  • Existing dangling-ref validation: schema.rs:15510-15539 (provider refs),
    15706-15733 (workspace read_memory_from), 15789-15807 (peer-group members).
  • feat(providers): wire providers.fallback into provider resolution #6295 (providers.fallback primary-selection determinism) — separate concern.

Metadata

Metadata

Assignees

Labels

agentAuto scope: src/agent/** changed.configAuto scope: src/config/** changed.enhancementNew feature or requestproviderAuto scope: src/providers/** changed.risk: mediumAuto risk: src/** or dependency/config changes.

Type

No type
No fields configured for issues without a type.

Projects

Status
Backlog

Relationships

None yet

Development

No branches or pull requests

Issue actions