fix(lw-deletions): Query system.parts on storage node instead of query node by onewland · Pull Request #7772 · getsentry/snuba

onewland · 2026-02-25T21:17:29Z

_get_partition_dates was querying system.parts via
cluster.get_query_connection(), which connects to a query node. In our
cluster topology, query nodes only have distributed tables (_dist), not
the local tables (_local) listed in DeletionSettings.tables. This meant
system.parts always returned zero rows, causing every partition-split
delete to fall back to un-split.

The fix connects to a storage node via cluster.get_local_nodes()[0]
instead, where the local tables and their system.parts metadata actually
live. This mirrors how the optimize CLI handles the same problem (it
requires an explicit --clickhouse-host pointing at a storage node).

Also incorporates review feedback from #7766:

Redis client is now initialized once in __init__ instead of on every
_execute_delete_by_partition call
Partition metric tags use relative week offset (e.g. -4, 0, +2)
instead of full dates to reduce cardinality

Fixes the partition-split fallback observed in production.

…y node Query nodes only have distributed tables (_dist), so system.parts returns no rows for _local tables, causing partition-split deletes to always fall back to un-split. Connect to a storage node via get_local_nodes() instead. Also addresses PR #7766 review feedback: - Move Redis client initialization to __init__ - Use relative week offset for partition metrics to reduce tag cardinality Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

In the distributed test suite, get_local_nodes() queries system.clusters which isn't available. Mock it to return a dummy node alongside the existing get_node_connection mock. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-25T22:04:36Z

snuba/lw_deletions/strategy.py

+        assert isinstance(schema, TableSchema)
+        partition_format = schema.get_partition_format()
+        assert partition_format is not None
+        parts = [decode_part_str(part, partition_format) for (part,) in response.results]


Duplicated get_active_partitions logic instead of reusing existing function

Medium Severity

The new inline code in _get_partition_dates (the query, schema lookup, decode_part_str call) is an exact copy of get_active_partitions from snuba/cleanup.py. Since get_active_partitions accepts a ClickhousePool as its first parameter and get_node_connection returns a ClickhousePool, the fix only needed to change the connection passed to the existing function — not duplicate all 12+ lines. Maintaining two identical copies risks divergent bug fixes.

cursor · 2026-02-25T22:04:36Z

snuba/lw_deletions/strategy.py

            member = f"{table}:{partition_date}"
-            if redis_client.sismember(tracking_key, member):
+            days_delta = (datetime.strptime(partition_date, "%Y-%m-%d") - datetime.now()).days
+            partition_week = str(days_delta // 7)


Week offset miscalculated due to time-of-day component

Low Severity

datetime.strptime(partition_date, "%Y-%m-%d") produces midnight, while datetime.now() includes the current time. The .days attribute of the resulting timedelta is systematically one less than the actual calendar-day difference (e.g., today's partition yields .days = -1 instead of 0). This shifts all partition_week tags by roughly one day, so a partition from today gets week "-1" rather than the expected "0".

MeredithAnya · 2026-02-25T23:00:03Z

snuba/lw_deletions/strategy.py

            member = f"{table}:{partition_date}"
-            if redis_client.sismember(tracking_key, member):
+            days_delta = (datetime.strptime(partition_date, "%Y-%m-%d") - datetime.now()).days
+            partition_week = str(days_delta // 7)


I was thinking this would be the week number in the year like datetime.now().isocalendar().week , not sure if that makes more or less sense than what you have here

that's actually what the AI did first, but I think it makes more sense to normalize it the way we do weeks_ago in some existing metrics

onewland marked this pull request as ready for review February 25, 2026 21:18

onewland requested a review from a team as a code owner February 25, 2026 21:18

onewland mentioned this pull request Feb 25, 2026

feat(lw-deletions): Add throttling mechanisms to slow down deletion processing #7771

Open

cursor bot reviewed Feb 25, 2026

View reviewed changes

MeredithAnya reviewed Feb 25, 2026

View reviewed changes

MeredithAnya approved these changes Feb 25, 2026

View reviewed changes

onewland merged commit 0b93a40 into master Feb 25, 2026
35 checks passed

onewland deleted the worktree-debug-split-issue branch February 25, 2026 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(lw-deletions): Query system.parts on storage node instead of query node#7772

fix(lw-deletions): Query system.parts on storage node instead of query node#7772
onewland merged 2 commits intomasterfrom
worktree-debug-split-issue

onewland commented Feb 25, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 25, 2026

Uh oh!

cursor bot Feb 25, 2026

Uh oh!

MeredithAnya Feb 25, 2026 •

edited

Loading

Uh oh!

onewland Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

onewland commented Feb 25, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 25, 2026

Choose a reason for hiding this comment

Duplicated get_active_partitions logic instead of reusing existing function

Uh oh!

cursor bot Feb 25, 2026

Choose a reason for hiding this comment

Week offset miscalculated due to time-of-day component

Uh oh!

MeredithAnya Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

onewland Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Duplicated `get_active_partitions` logic instead of reusing existing function

MeredithAnya Feb 25, 2026 •

edited

Loading