perf(redis_admin): lazy clear-by-filter preview (count off the request thread) by thomasrockhu-codecov · Pull Request #907 · codecov/umbrella

thomasrockhu-codecov · 2026-05-01T19:59:53Z

Summary

Makes the /admin/redis_admin/celerybrokerqueue/clear-by-filter/ preview page render instantly even on million-message queues. The previous synchronous GET path called streaming_celery_count (full LLEN(queue) walk) + materialised up to CELERY_BROKER_DISPLAY_LIMIT messages from the queryset before painting anything; on a 1M-deep queue that's 30s+ of blocking work in the request thread.

Four tiers, layered:

Lazy preview load. GET handler now renders only the filter summary + skeleton placeholders in < 200ms. A new JSON endpoint (clear-by-filter/preview/?queue_name=…) runs the count + sample on demand. JS fills them in once they arrive; submit buttons start disabled and unlock on count completion.
Background count job for huge queues. Above REDIS_ADMIN_CLEAR_BY_FILTER_PREVIEW_INLINE_LIMIT (default 200k), the preview endpoint kicks off a dry_run=True chunked-clear-job and returns {job_id, status_url} instead of the count. JS polls the existing clear-job status hash; on status=completed it reads matched as the match count. Reuses feat(redis_admin): chunked celery broker clear with progress bar + wildcard sentinel fix #904 machinery (start_celery_broker_clear_job, clear_by_filter_status_view) — no new audit modes, no new infrastructure.
60s count cache. Synchronous-mode counts are cached in cache Redis under redis_admin:preview_count:<sha256(queue|task|repoid|commitid)> with a REDIS_ADMIN_CLEAR_BY_FILTER_PREVIEW_CACHE_TTL_SECONDS TTL (default 60). Refresh / Back-Forward returns the cached count instantly with a cached_at field. ?bust=1 skips the cache.
Skip-count escape hatch. Muted-italic "Trust the filter, skip the count →" link on the preview, visible from page-load. Routes through a tiny clear-by-filter/skip-count/ view that 302s back to the preview page with count_skipped=1 set; the GET handler then suppresses the count card and unlocks the submit buttons.

The four tiers stack: a deep queue with a warm cache hits Tier 3 first; a deep queue with a cold cache + inline-limit-busting depth hits Tier 2 and polls; a typical queue hits Tier 1 inline; the operator can fall back to Tier 4 at any point if the count surface is broken or just too slow.

Test plan

Sandbox results: 189 passed, 30 deselected, 90 errors in redis_admin/tests/ (errors are all the documented "could not translate host name 'postgres'" failures for @pytest.mark.django_db tests — same shape as on main; the +12 new tests all pass and bump the green count from 177 → 189).

RUN_ENV=TESTING DJANGO_SETTINGS_MODULE=codecov.settings_test uv run pytest \
    -m 'not django_db' redis_admin/tests/test_clear_by_filter_preview.py -v
============================== 12 passed in 1.63s ==============================

Settings introduced

Setting	Default	Purpose
`REDIS_ADMIN_CLEAR_BY_FILTER_PREVIEW_INLINE_LIMIT`	`200_000`	`LLEN(queue)` threshold above which the preview endpoint switches to background-job mode.
`REDIS_ADMIN_CLEAR_BY_FILTER_PREVIEW_CACHE_TTL_SECONDS`	`60`	TTL for the synchronous-mode count cache.

Files

 apps/codecov-api/redis_admin/README.md             |  70 ++-
 apps/codecov-api/redis_admin/admin.py              | 624 +++++++++++++++------
 apps/codecov-api/redis_admin/services.py           | 207 +++++++
 apps/codecov-api/redis_admin/settings.py           |  26 +
 .../css/celery_clear_by_filter_preview.css         | 173 ++++++
 .../js/celery_clear_by_filter_preview.js           | 360 ++++++++++++
 .../celerybrokerqueue/clear_by_filter.html         | 237 +++++---
 .../tests/test_clear_by_filter_preview.py          | 554 ++++++++++++++++++
 8 files changed, 2017 insertions(+), 234 deletions(-)

Notes for review

The new JS / CSS load via {% static %} + defer from clear_by_filter.html, with no inline <script> — same CSP-compatible pattern as celery_clear_progress.{js,css} from feat(redis_admin): chunked celery broker clear with progress bar + wildcard sentinel fix #904.
_parse_clear_by_filter_params is the single chokepoint for filter validation; both the HTML and JSON entrypoints route through it so a future filter rule (e.g. tightening commitid length) lands in one place.
Job-mode response carries the synchronously-fetched sample (LRANGE 0 N is fast even on million-message queues), so the preview table renders immediately even while the count job is still running.
The destructive POST path still computes match_count synchronously for the safety checks (match_count == 0 early-return, clear_keep_one single-match guard) and the dry-run banner. That's intentional: those checks need the live count, and start_celery_broker_clear_job itself re-snapshots LLEN(queue) so the progress page total isn't tied to the preview's count.
No conflict with chore(redis_admin): lower frequency-chart sample limit to 20k #906 (already merged).

Made with Cursor

Note

Medium Risk
Touches superuser-only admin flows that can trigger destructive queue-clears, and adds new JSON/job orchestration + caching that could affect operator UX and Redis load if incorrect.

Overview
Speeds up the celerybrokerqueue/clear-by-filter/ preview by removing synchronous Redis scans from the GET handler and replacing them with a skeleton page populated by new lazy-preview JS.

Adds a superuser-only clear-by-filter/preview/ JSON endpoint that returns either an inline count (with a short-lived Redis cache and ?bust=1) or a dry_run chunked-job id for deep queues (with job-id deduping), plus a clear-by-filter/skip-count/ escape hatch that unlocks destructive actions without waiting for counts. Updates the template/UI accordingly, introduces new settings knobs for thresholds/TTLs, and adds/adjusts tests to pin the new response shapes and failure-mode behavior.

^{Reviewed by Cursor Bugbot for commit 308ea62. Bugbot is set up for automated code reviews on this repo. Configure here.}

…t thread) The clear-by-filter preview page used to call streaming_celery_count + materialise up to CELERY_BROKER_DISPLAY_LIMIT messages inside the GET handler before painting anything; on a 1M-deep queue that's tens of seconds of blocking work in the gunicorn request thread. Four-tier fix, layered: 1. Lazy preview load. GET handler renders the form + skeleton in <200ms, no Redis scans. New JSON endpoint `clear-by-filter/preview/` computes the count + sample on demand; `celery_clear_by_filter_preview.{js,css}` fills them in once they arrive. Submit buttons start disabled and unlock on count completion. 2. Background count job for huge queues. Above REDIS_ADMIN_CLEAR_BY_FILTER_PREVIEW_INLINE_LIMIT (default 200_000) the preview endpoint kicks off a `dry_run=True` chunked-clear job (existing start_celery_broker_clear_job machinery — no new infrastructure) and returns {job_id, status_url}; JS polls the existing status hash and reads `matched` as the count on completion. 3. 60s synchronous-mode count cache in cache Redis under `redis_admin:preview_count:<sha256(queue|task|repoid|commitid)>`. Refresh / Back-Forward / multi-tab return the cached count instantly with a `cached_at` field. `?bust=1` skips the cache. 4. Skip-count escape hatch. Muted-italic link routes to a tiny re-render of the preview page with `count_skipped=1` set; the operator types the queue name and proceeds without waiting for the count. Coverage: 12 new tests in `redis_admin/tests/test_clear_by_filter_preview.py`, all under `-m 'not django_db'`. No `@django_db` tests added — the audit-log integration for the job-mode dry-run is already covered by `test_celery_broker_clear_job.py`'s chunked-mode audit test, which runs in CI.

PR #907 made the clear-by-filter confirmation page render a skeleton on initial GET; the count + sample now load via the JSON preview endpoint that the JS hits on DOMContentLoaded. The four failing tests were still asserting the old eager-render "Matched: N message(s)" string against the GET body. Update them to: * pin the skeleton markers (#celery-clear-by-filter-preview-root, .celery-clear-preview-loader, "Counting matches…") * pin the dry-run flash text ("would clear N of M matching message(s)") which still surfaces synchronously via the Django messages framework on the dry-run response * pin the keep-one button's server-rendered hidden state (.celery-clear-keep-one.celery-clear-preview-hidden) — the JS-driven match_count<=1 invariant moved to the preview JSON endpoint's tests

sentry · 2026-05-02T02:26:47Z

Codecov Report

❌ Patch coverage is 92.53731% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.90%. Comparing base (6fb832f) to head (308ea62).
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
apps/codecov-api/redis_admin/admin.py	92.91%	9 Missing ⚠️
apps/codecov-api/redis_admin/services.py	91.54%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #907      +/-   ##
==========================================
+ Coverage   91.89%   91.90%   +0.01%     
==========================================
  Files        1316     1316              
  Lines       50586    50741     +155     
  Branches     1625     1625              
==========================================
+ Hits        46485    46633     +148     
- Misses       3795     3802       +7     
  Partials      306      306

Flag	Coverage Δ
apiunit	`94.90% <92.53%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

codecov-notifications · 2026-05-02T02:26:51Z

Codecov Report

❌ Patch coverage is 92.53731% with 15 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
apps/codecov-api/redis_admin/admin.py	92.91%	9 Missing ⚠️
apps/codecov-api/redis_admin/services.py	91.54%	6 Missing ⚠️

📢 Thoughts on this report? Let us know!

Bugbot caught a HIGH severity bug in the lazy clear-by-filter preview commit (46b6e5f): when `streaming_celery_count` raised on the destructive POST path, the fallback set `match_count = 0`, which then triggered the very next `match_count == 0` safety check and silently 302'd to "No messages match the filter — nothing to clear." A transient Redis blip during the count probe would convert a typed-confirmed `clear_all` / `clear_keep_one` into a misleading no-op redirect, with no signal to the operator that anything had gone wrong. Track count failure with an explicit `count_failed` flag and short-circuit the count-based safety checks (`match_count == 0` early-return; `clear_keep_one` single-match guard) when the count probe raised. The destructive path keeps gating on the typed- confirmation, and `start_celery_broker_clear_job` re-snapshots `LLEN(queue)` itself so the progress page total is still accurate. The dry-run banner gets a count-failed variant that drops the "of {match_count}" denominator and surfaces "(count probe failed; ...)" so the operator knows why the matched-total is missing. Two new TestCase tests pin the behavior: * `test_clear_all_proceeds_when_count_probe_raises` — POST `clear_all` with a typed confirmation must redirect to the background-job progress page even when `streaming_celery_count` raises, NOT to the changelist with a "nothing to clear" flash. * `test_dry_run_surfaces_count_failed_banner_when_count_raises` — dry-run banner drops the "of N" denominator and includes "count probe failed" when the count probe raised.

The new `test_clear_all_proceeds_when_count_probe_raises` test (added in 87af8f5) asserted the destructive POST redirect landed under `/clear-by-filter/progress/<uuid>/`, but the actual URL pattern wired up in `CeleryBrokerQueueAdmin.get_urls` is `/clear-by-filter/job/<uuid>/` (the URL name is `_clear_by_filter_progress` but the path segment is `job/`). CI run 25241620362 surfaced the mismatch — the redirect Location was `/admin/redis_admin/celerybrokerqueue/clear-by-filter/job/ 044680cf-…/`, which proves the bug fix itself works (the destructive action proceeded past the count failure and started the background job) but my assertion substring was wrong. Update the assertion to pin `/clear-by-filter/job/` and add a guardrail that the location is NOT the `?queue_name__exact=…` changelist redirect that the original (buggy) commit produced.

Three Bugbot findings on commit 87af8f5 — addressing all of them in one focused commit since they all touch `clear_by_filter_preview_view`: * **Medium — sample materialise bubbles HTML 500 from a JSON endpoint.** `_materialise_sample_targets` was unwrapped, so a broker outage during the sample LRANGE raised through to Django's HTML 500 page from a JSON endpoint — defeating the carefully-shaped `{"error": ...}` JSON the JS client unmarshals from the count / LLEN paths. Wrap it in try/except mirroring the LLEN fallback pattern so a flaky sample lookup degrades to an empty sample table while the count payload still ships. * **Medium — deep-queue preview spawns duplicate background jobs on refresh.** `start_celery_broker_clear_job` was called unconditionally on every preview request above the inline threshold, so a page refresh / second tab / JS retry kicked off N parallel full-queue scans for the same filter. Add `get_or_start_clear_by_filter_preview_job` in `services.py` — best-effort dedup keyed on a SHA-256 of the filter tuple (mirrors `_clear_by_filter_preview_cache_key` for operator- mental-model consistency), with a 5-minute TTL (new `CLEAR_BY_FILTER_PREVIEW_JOB_LOCK_TTL_SECONDS` setting). Stale-id guard: if the cached `job_id` no longer resolves to a live job hash, fall through to spawn a fresh job rather than hand the JS a dead id. Cache outage falls through to always-spawn so the preview never returns a 500 because of a flaky lock. * **Low — duplicated URL query-building across preview / skip-count.** The `urlencode({k: v for k, v in {…}.items() if v != ""})` comprehension was copy-pasted across three call sites. Extract `CeleryBrokerQueueAdmin._build_clear_by_filter_querystring` with an `extras={...}` slot so `clear_by_filter_skip_count_view` can add `count_skipped=1` without re-spelling the filter dict. Pure refactor — output is byte-identical to the old comprehension. Three new pytest tests pin the behaviour: * `test_preview_endpoint_dedups_job_for_same_filter_within_lock_ttl` — two preview requests for the same filter return the same `job_id` and `start_celery_broker_clear_job` is called exactly once; a third request with a different `repoid` spawns a fresh job (the lock is keyed on the full filter tuple). * `test_preview_endpoint_dedup_falls_through_when_cached_job_is_stale` — pin the stale-id guard: deleting the job hash between two preview calls forces the second call to spawn a fresh job rather than hand the JS a dead `job_id`. * `test_preview_endpoint_returns_empty_sample_when_materialise_raises` — JSON endpoint contract preserved (200 + `application/json`) when `_materialise_sample_targets` raises; sample degrades to `[]` and the count still ships.

Two Bugbot Low-severity findings on commit dd8ad8d: * **Sample-table heading is hardcoded "20".** PR #907 replaced the pre-PR dynamic `<p>The first {{ sample_targets|length }} matching message(s):</p>` with a static `"The first 20"` — misleading when fewer than 20 messages match (the table below it would show only e.g. 3 rows). The pre-PR template also had a `"Showing first X of Y"` note when the sample window was narrower than the match set, lost in the rewrite. Make the heading JS-rewritten on the synchronous-mode preview response. The new `renderSampleHeader(sampleLength, matchCount)` helper produces: - `"Showing first {sample} of {match} matching message(s):"` when the sample window is narrower than the match set, - `"The first {sample} matching message(s):"` when the sample covers the whole match set, - the harmless fallback `"Matching messages:"` when the JS preview never lands (Tier 4 skip-count render path, which intentionally ships an empty sample). * **Duplicated SHA-256 logic between count cache + job lock.** `_clear_by_filter_preview_cache_key` and the new `_clear_by_filter_preview_job_lock_key` had identical bodies — same canonical `"|".join(...)` shape, same `hashlib.sha256(...)` call. Extract `_clear_by_filter_filter_digest` as the single chokepoint so the two key families can never accidentally diverge in their hashing logic. Pure refactor — both keys produce the same bytes as before.

Bugbot flagged a HIGH-severity false positive on commit ea75f1b ("Job-mode match count always reads as zero"), claiming `snapshot.matched` arrives at the JS as a string. It does not — `_job_progress_context._int("matched")` casts it to int before the `JsonResponse`, mirroring `_int("processed")` and `_int("total_estimated")`. The existing `test_clear_job_status_view_returns_json_for_running_job` already pins `processed` / `total_estimated` as JSON ints; extend it to also pin `matched` / `drifted` / `passes_run` so a future regression that drops the `_int(...)` cast lands here loudly instead of silently re-rendering "0 matching messages" on the operator's preview page. (Reply on the inline review comment explains the analysis.)

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit ca453c2. Configure here.}

When the lazy clear-by-filter preview enters Tier 2 deep-queue mode it calls `show()` on `#celery-clear-preview-job-card` to surface the chunked-job progress bar (line ~367 of celery_clear_by_filter_preview.js). On any terminal state the polling loop hands off to either `fillSynchronousResult` (on `completed`) or `showError` (on `cancelled` / `failed`). Neither helper hid the job card, so the operator saw the now-stale progress card stacked alongside the resolved count card or the error card. Hide the job card from both helpers so the terminal-state UI replaces the progress card cleanly, instead of layering on top of it. No-op for the synchronous tier (the job card was never shown there). Bug surfaced by Cursor Bugbot on PR #907 (BUGBOT_BUG_ID 2115197d-d079-4f62-a9ed-7f1be226af88, MEDIUM).

thomasrockhu-codecov · 2026-05-02T11:01:45Z

Folded into #905; keeping this open as the per-feature review record.

cursor Bot reviewed May 1, 2026

View reviewed changes

Comment thread apps/codecov-api/redis_admin/admin.py

thomasrockhu-codecov mentioned this pull request May 1, 2026

perf(redis_admin): drop clear-by-filter preview load; reuse chart numbers #909

Open

4 tasks

cursor Bot reviewed May 2, 2026

View reviewed changes

Comment thread apps/codecov-api/redis_admin/admin.py Outdated

Comment thread apps/codecov-api/redis_admin/admin.py

Comment thread apps/codecov-api/redis_admin/admin.py Outdated

thomasrockhu-codecov added 2 commits May 2, 2026 11:44

cursor Bot reviewed May 2, 2026

View reviewed changes

Comment thread apps/codecov-api/redis_admin/templates/admin/redis_admin/celerybrokerqueue/clear_by_filter.html Outdated

Comment thread apps/codecov-api/redis_admin/services.py

cursor Bot reviewed May 2, 2026

View reviewed changes

Comment thread apps/codecov-api/redis_admin/static/redis_admin/js/celery_clear_by_filter_preview.js

cursor Bot reviewed May 2, 2026

View reviewed changes

Comment thread apps/codecov-api/redis_admin/static/redis_admin/js/celery_clear_by_filter_preview.js

thomasrockhu-codecov mentioned this pull request May 2, 2026

feat(redis_admin): celery broker clear hardening (consolidates #907/#908/#909) #905

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(redis_admin): lazy clear-by-filter preview (count off the request thread)#907

perf(redis_admin): lazy clear-by-filter preview (count off the request thread)#907
thomasrockhu-codecov wants to merge 8 commits intomainfrom
redis-admin/lazy-clear-by-filter-preview

thomasrockhu-codecov commented May 1, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

sentry Bot commented May 2, 2026 •

edited

Loading

Uh oh!

codecov-notifications Bot commented May 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

thomasrockhu-codecov commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thomasrockhu-codecov commented May 1, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Settings introduced

Files

Notes for review

Uh oh!

Uh oh!

sentry Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov-notifications Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thomasrockhu-codecov commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thomasrockhu-codecov commented May 1, 2026 •

edited by cursor Bot

Loading

sentry Bot commented May 2, 2026 •

edited

Loading

codecov-notifications Bot commented May 2, 2026 •

edited

Loading