How to handle subactor UDS sock-files leaked by `hard_kill()`/SIGKILL

## Summary

Under fork-spawn backends (especially `main_thread_forkserver`), when
`tractor.spawn._spawn.hard_kill` falls through to `proc.kill()`
(SIGKILL) — which it does whenever the graceful cancel exceeds
`terminate_after=1.6s` — the killed subactor's UDS sock-file at
`${XDG_RUNTIME_DIR}/tractor/<name>@<pid>.sock` accumulates on disk
indefinitely.

Distinct from the discovery-client `CLOSE_WAIT` TCP fd leak in #452 —
different layer (spawn vs discovery), different transport (UDS vs
TCP), different lifecycle. Filing separately so each can be tracked
independently.

## Root cause

The subactor's IPC server unlink lives in
`tractor.ipc._server::_serve_ipc_eps`'s `finally:` block, which calls
`tractor.ipc._uds.close_listener → os.unlink(addr.sockpath)`. SIGKILL
bypasses ALL Python execution → no `finally` blocks fire → sock-file
remains on disk forever.

```
$ ls $XDG_RUNTIME_DIR/tractor/
sleeper@492837.sock      # binder pid 492837 is dead
sync_blocking_sub@492901.sock
namesucka@491847.sock
... (accumulates over a test session)
```

## Reproducer

`tests/test_cancellation.py::test_cancel_while_childs_child_in_sync_sleep`
under `--tpt-proto=uds --spawn-backend=trio` reliably leaks
`sleeper@<pid>.sock`. Mechanism:

- Parent calls `Portal.cancel_actor()` → IPC cancel-req msg sent to
  `sleeper`.
- `sleeper` is blocked in sync `time.sleep(3)` → trio scheduler can't
  deliver the `Cancelled` until the sleep returns.
- `hard_kill`'s `move_on_after(1.6s)` deadline fires.
- `proc.kill()` → SIGKILL → no Python cleanup.
- `sleeper@<pid>.sock` orphaned in `$XDG_RUNTIME_DIR/tractor/`.

Tests that consistently leak under `--tpt-proto=uds
--spawn-backend=trio`:

- `test_cancel_via_SIGINT_other_task[trio]` — leaks 3
  `namesucka@<pid>.sock`
- `test_cancel_while_childs_child_in_sync_sleep` — leaks
  `sleeper@<pid>.sock` (and `sync_blocking_sub@<pid>.sock` in the
  True variant)
- `test_fast_graceful_cancel_when_spawn_task_in_soft_proc_wait_for_daemon[trio]`
  — leaks `fast_boi@<pid>.sock`

## Side effects

- **fd-table pressure** across long pytest sessions — eventually
  `EMFILE`.
- **Test-suite flakiness amplifier** — under `--tpt-proto=uds`, a
  single hard-killed subactor leaves a sock file that a sibling
  test's `wait_for_actor`/`find_actor` discovery probes can
  accidentally hit (`FileExistsError` on rebind, or `epoll_register`
  on a half-closed peer-FIN'd fd).
- **Kernel inode accumulation** — though tractor uses
  `XDG_RUNTIME_DIR` (tmpfs on most distros), sock inodes still
  consume kernel resources until the filesystem is unmounted.

## Detection (autouse fixture)

`tractor._testing._reap._track_orphaned_uds_per_test` (committed in
`1cdc7fb3`) snapshots `$XDG_RUNTIME_DIR/tractor/` before+after each
test and emits `UserWarning: UDS sock-file LEAK detected from test
(reaping)` when new orphaned sockfiles appear. Per-test scoping makes
blame obvious vs a session-end blanket sweep.

Companion CLI extension: `scripts/tractor-reap --uds` / `--uds-only`
(committed in `0996a836`) for post-mortem cleanup when a session
crashed.

## Fix

`tractor.spawn._reap.unlink_uds_bind_addrs()` — invoked from
`hard_kill` unconditionally post-SIGKILL. Two cleanup paths in order:

- **Explicit `bind_addrs`** — when the parent set the subactor's bind
  addrs at spawn time, unlink each UDS-flavored sockpath directly.
- **Self-assigned reconstruction** — when `bind_addrs` is empty (the
  common case: subactor picked its own random sock via
  `UDSAddress.get_random()`), reconstruct the path from
  `(subactor.aid.name, proc.pid)` using the same `<name>@<pid>.sock`
  convention. Works because the subactor uses its own `os.getpid()`
  at bind time, which equals `proc.pid` from the parent's view.

Idempotent: `FileNotFoundError` (graceful exit already-unlinked, sock
never bound under early-spawn cancel, or transport wasn't UDS this
run) is silenced; other `OSError`s log a warning but never raise.

## Future work — authoritative bind-addr tracking

The convention-based path (2) above hardcodes the `<name>@<pid>.sock`
convention from `tractor.ipc._uds.UDSAddress`. If that convention
ever changes — or the subactor binds to a non-default
`bindspace`/`filedir` — we'll silently fail to unlink.

A more authoritative approach:

- Subactors register their bound UDS sockpaths in a per-process
  registry inside `tractor.ipc._uds` at `start_listener()` time.
- The subactor reports its bound sockpath(s) back to the parent over
  IPC immediately post-bind (extension to `SpawnSpec` reply / a new
  handshake msg).
- Parent caches the subactor's authoritative sockpaths.
- `unlink_uds_bind_addrs()` checks the cache FIRST, falls back to
  convention-reconstruction if the subactor died before reporting.

Documented as a TODO in `tractor.spawn._reap`'s module docstring;
tracking via a follow-up issue if needed.

## Related

- #452 — the discovery-client `CLOSE_WAIT` TCP fd leak. Different bug
  class (TCP/discovery layer vs UDS/spawn layer) but same broader
  theme of "fork-spawn unmasked latent cleanup gaps".
- `ai/conc-anal/trio_wakeup_socketpair_busy_loop_under_fork_issue.md`
  — different bug entirely (upstream trio `WakeupSocketpair.drain()`
  EOF busy-loop), but the patch for THAT one is what made these tests
  reliable enough to observe the UDS leaks consistently in CI.
- #379 — subint umbrella tracking issue.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle subactor UDS sock-files leaked by `hard_kill()`/SIGKILL #454

Summary

Root cause

Reproducer

Side effects

Detection (autouse fixture)

Fix

Future work — authoritative bind-addr tracking

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to handle subactor UDS sock-files leaked by hard_kill()/SIGKILL #454

Description

Summary

Root cause

Reproducer

Side effects

Detection (autouse fixture)

Fix

Future work — authoritative bind-addr tracking

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

How to handle subactor UDS sock-files leaked by `hard_kill()`/SIGKILL #454