Skip to content

feat(ci): xdist per-worker isolation infrastructure#110775

Merged
mchen-sentry merged 4 commits intomasterfrom
mingchen/di-1712-xdist-infra
Mar 16, 2026
Merged

feat(ci): xdist per-worker isolation infrastructure#110775
mchen-sentry merged 4 commits intomasterfrom
mingchen/di-1712-xdist-infra

Conversation

@mchen-sentry
Copy link
Member

@mchen-sentry mchen-sentry commented Mar 16, 2026

When pytest-xdist spawns multiple workers inside a single CI shard, all workers share ci infrasturcture (Redis, Kafka, Snuba, and Relay). Without isolation, workers corrupt each other (e.g. reset_snuba/flushdb(), Kafka events pollute, identical snowflake IDs).

This PR gives each worker its own isolated resources. Everything is gated on xdist env vars, so this change will be a no-op on master as is.

Changes

  • xdist.py - resolves worker ID at module level, provides helpers (e.g. get_redis_db())
  • per worker Redis (each worker gets its own Redis DB), per worker Kafka topics (topic names suffixed with worker ID), per worker Snuba (per worker ports), per worker Relay (per worker ports, unique contianers)
  • deterministic region name so all workers generate the same region name (reequired for xdist identical test collection)
  • per worker snowflake IDs to avoid collisions
  • disable crash recovery in pytest-rerunfailures as this is broken under xdist due to Sentry's global socket.setdefaulttimeout(5), normal --reruns unaffected

@linear-code
Copy link

linear-code bot commented Mar 16, 2026

DI-1712 xdist infra

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Mar 16, 2026
Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Falsy-zero check masks xdist worker 0 identity
    • Changed 'xdist._worker_num or 0' to 'xdist._worker_num if xdist._worker_num is not None else 0' in relay.py to properly distinguish worker 0 from non-xdist case, matching the pattern used throughout the codebase.

Create PR

Or push these changes by commenting:

@cursor push 99ad560e14
Preview (99ad560e14)
diff --git a/src/sentry/testutils/pytest/relay.py b/src/sentry/testutils/pytest/relay.py
--- a/src/sentry/testutils/pytest/relay.py
+++ b/src/sentry/testutils/pytest/relay.py
@@ -68,7 +68,7 @@
     template_path = _get_template_dir()
     sources = ["config.yml", "credentials.json"]
 
-    worker_num = xdist._worker_num or 0
+    worker_num = xdist._worker_num if xdist._worker_num is not None else 0
     relay_port = ephemeral_port_reserve.reserve(ip="127.0.0.1", port=33331 + worker_num * 100)
 
     redis_db = xdist.get_redis_db()

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

@mchen-sentry mchen-sentry force-pushed the mingchen/di-1712-xdist-infra branch from 91a5699 to e4a26bb Compare March 16, 2026 18:53
Give each xdist worker its own Redis DB, Kafka topic names, Snuba
instance, Relay container, and snowflake ID range. All gated on xdist
env vars — no-ops without them.

- New xdist.py module with get_redis_db(), get_kafka_topic(), get_snuba_url()
- Per-worker Relay container names and port offsets
- Deterministic region name seeding across workers
- Per-worker snowflake IDs to avoid IntegrityError collisions
- Disable pytest-rerunfailures crash recovery (broken under xdist due
  to Sentry's global socket timeout)
The Span NamedTuple dropped end_timestamp in e40ec24 but this
test callsite was missed, causing mypy to fail.
…t workers

Redis defaults to 16 databases (0-15). With a base DB of 9, we can
support at most 7 workers before hitting an out-of-range error. Fail
fast with a clear message instead of a cryptic Redis error.
@mchen-sentry mchen-sentry merged commit 3cd9512 into master Mar 16, 2026
78 checks passed
@mchen-sentry mchen-sentry deleted the mingchen/di-1712-xdist-infra branch March 16, 2026 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants