Script to Remediate Over Cap Visits by zandre-eng · Pull Request #1152 · dimagi/commcare-connect

zandre-eng · 2026-04-28T13:45:28Z

Product Description

No user-facing changes. This PR will not be merged — it exists solely to give reviewers a clean diff to comment on for the one-off remediation script that will be executed manually via ./manage.py shell in production. The script targets UserVisit rows that were silently auto-approved past their per-worker cap by the bug fixed in ze/fix-cap-bypass-when-duplicate-flag-off (CI-639), flipping them to over_limit, propagating the status to their CompletedWork, and recomputing payment_accrued for the affected workers.

Technical Summary

Link to ticket here — companion to the code fix on ze/fix-cap-bypass-when-duplicate-flag-off.

This is a release path 1 feature — Improvements to existing features & quick wins

The script reproduces the cap-check the buggy submission path failed to honour: for each OpportunityClaimLimit, it loads active visits (status ∉ {over_limit, trial}) for that (opportunity_access, payment_unit) ordered by date_created, treats the first max_visits as legitimate, and treats the tail as the over-cap set that requires correction. The default scope is a single opportunity (set via OPP_UUID); pass None for a global scan once the targeted run is verified. DRY_RUN = True is the default so a reviewer can confirm the breakdown before any write happens.

Identification (_find_over_cap_visit_ids): ordering by date_created (server-received timestamp) ensures that earlier in-cap submissions are kept and only the chronological tail is flipped. Excludes already-over_limit and trial rows so re-runs are idempotent.
CompletedWork safety (_completed_works_safe_to_flip): a CompletedWork is only flipped to over_limit if every linked UserVisit is in the over-cap set. Mixed completed-works (linked to both in-cap and over-cap visits) are not touched and are reported as a warning for manual review — this matches the in-flight behaviour at processor.py:427-429, which also gates the completed-work flip per visit.
Status mutation: visits are flipped via Python attribute assignment (v.status = over_limit) so the UserVisit.__setattr__ override updates status_modified_date automatically. bulk_update then writes both fields in batches of BATCH_SIZE = 500 inside a single transaction.atomic() block.
Locking: select_for_update() per batch protects against concurrent submissions touching the same rows mid-script.
Payment recompute: after the status writes commit, update_payment_accrued_for_user(access, incremental=False) is called for each affected access. The incremental=False recompute is required because flipping a visit down from approved to over_limit is the opposite of what the incremental path is designed for (it skips already-approved completed-works).
Output: dry-run prints a per-access count of visits that would flip; non-dry-run prints final counts of visits/completed-works/accesses touched.

Safety Assurance

Safety story

Will not be merged. This PR is a review surface only; the script is intended to be invoked manually (./manage.py shell < scripts/remediate_over_cap_visits.py) on a pre-arranged production window. No CI deploy, no scheduled job.
Default is dry-run. DRY_RUN = True at the top means the first invocation prints the plan without writing. A reviewer should require the actual production run to be witnessed (or executed) by a second operator after the dry-run output has been sanity-checked.
Atomic. All status writes happen inside a single transaction.atomic() block; if the run is killed mid-flight or any constraint is violated, nothing commits.
Reversible at the row level. A UserVisit flipped from approved to over_limit can be flipped back via the same import/admin paths that already exist (bulk_update_visit_status in visit_import.py). The status_modified_date is updated, so the audit trail records when the remediation ran.
Idempotent. Re-running excludes rows already at over_limit/trial from the "active" count, so a second run finds nothing to do.
Conservative on CompletedWork. Mixed completed-works are never touched automatically; the warning lists them by id for human review. This avoids the failure mode where a completed-work shared between an in-cap and out-of-cap visit would have its status incorrectly flipped.
Money already paid out is not reversed. The payment_accrued recompute reduces the future accrual but does not undo prior payouts. Operator must confirm the agreed financial treatment with finance (absorb / deduct / adjust) before the non-dry-run execution.
Suspended workers are not skipped. The per-access call to update_payment_accrued_for_user does not gate on access.suspended. If the production data has any suspended-but-affected workers, decide before running whether to skip them or include them.

Automated test coverage

No automated tests are added in this PR. The script is a one-shot remediation — its correctness is validated by:

The companion code fix's regression test test_over_limit_status_preserved_when_duplicate_flag_disabled (in ze/fix-cap-bypass-when-duplicate-flag-off), which guarantees the bug cannot reproduce after deploy.
The dry-run output, reviewed before each non-dry-run execution.

If reviewers want it as a defensible test, the suggested approach would be: build a fixture with one access at the cap and one access two visits over, run the script in-process with DRY_RUN = False, and assert (a) the in-cap access is untouched, (b) the over-cap access has exactly two visits flipped to over_limit, (c) the completed-works are flipped where every linked visit is over-cap, (d) payment_accrued is recomputed downward. Happy to add this if desired.

QA Plan

QA will not be performed for this change. Below is the testing plan for reference:

Reviewer reads the script end-to-end, confirms the identification logic in _find_over_cap_visit_ids matches the intent (chronological tail beyond max_visits).
Reviewer confirms _completed_works_safe_to_flip correctly leaves mixed completed-works alone.
On staging (or a prod replica), run with OPP_UUID = "21bca9f7-00f9-4804-a7c0-e77c6139e579" and DRY_RUN = True. Confirm the output reports exactly two over-cap visits across two distinct accesses, matching the prod-data we already audited.
On staging, set DRY_RUN = False and re-run. Verify the two UserVisit rows are now over_limit, the two CompletedWork rows are over_limit, and the two affected workers' payment_accrued reflects the recompute.
Re-run with DRY_RUN = True again — should report "Nothing to remediate" (idempotency check).
Production execution gate: confirm with finance how prior payouts to the over-cap workers will be reconciled before running the non-dry-run script in prod.
Production execution: run with OPP_UUID set to the affected opportunity, DRY_RUN = True first, then DRY_RUN = False only after the dry-run is reviewed. Capture the output.
Optional follow-up: set OPP_UUID = None and DRY_RUN = True to scan for any other opportunities silently affected by the same bug. Decide per-opportunity remediation with finance.

Labels & Review

The set of people pinged as reviewers is appropriate for the level of risk of the change

mkangia · 2026-04-28T15:53:08Z

@calellowitz @sravfeyn

would appreciate your review on this one please.

calellowitz

I think the logic is not quite right here, but I have some higher level concerns that I will also leave in the ticket. I suspect what is happening here is users overriding the overlimit status of visits that had been correctly flagged, rather than the system missing them. That has happened both intentionally and unintentionally in the past, and I would be concerned about overwriting those in an automated way, without talking to the individual PMs for each affected intervention, especially if some have already been paid.

calellowitz · 2026-04-28T21:23:08Z

+            .order_by("date_created")
+            .values_list("id", flat=True)
+        )
+        if len(active_visit_ids) > cl.max_visits:


I don't think this a correct comparison. ClaimLimits are per payment unit, and some payment units contain multiple deliver units, but this is counting UserVisits/deliver units. So if a payment unit requires a registration and service delivery form, which I think is a common setup even for single visit interventions, this will start to exclude visits only halfway to the limit. There will be 100 UserVisits when there are only 50 payment earned.

Good catch, I've switched to counting CompletedWork rows instead (one per payment unit), and then flip each over-cap CW + all its visits as a unit.

Addressed in 4b5bbc3.

sravfeyn · 2026-04-29T07:07:34Z

I have same feeling as Cal on this that it's probably best to do this on opportunities where this is explicitly asked for rather than on all Opportunities for the same reasons he hilights.

The cap (claim_limit.max_visits) is on earned payment units, not raw UserVisits. A payment unit can have multiple deliver units (e.g. a registration + a service-delivery form), so counting visits across deliver units double-counts: 100 UserVisits = 50 earned payment units when there are 2 deliver units per payment unit, but the previous identification flagged that as already at the cap. Switch to counting CompletedWork rows: each CW is unique per (access, entity_id, payment_unit) and represents one earned payment unit regardless of how many deliver-unit forms it took to satisfy. For each over-cap CW we now flip the CW *and all its UserVisits* to over_limit as a unit, removing the previous "mixed completed-work" warning case (no longer possible by construction). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Before applying the fix in prod the operator wants to know how much worker and org accrual will drop. Aggregate saved_payment_accrued and saved_org_payment_accrued across the over-cap CompletedWork rows, then print per-access and total deltas in the DRY_RUN branch. Also include the projected reduction in the final post-apply summary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The recompute loop runs outside the atomic block (update_payment_accrued_for_user takes its own Redis lock per access), so a Ctrl-C or transient failure mid-loop leaves the status flips committed but payment_accrued stale for the unfinished accesses. Wrap the loop in try/finally and print a WARNING listing the access ids that still need a manual recompute. Also pre-fetch accesses with in_bulk instead of N .get() calls and derive affected_access_ids from the plan upfront. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

zandre-eng · 2026-04-29T10:18:09Z

@sravfeyn @calellowitz Thanks for the feedback on this. I agree with the points raised and also think it would be risky to apply this generally to all opps in prod. My plan is to just run this script for the affected opp in the linked ticket as they've explicitly requested it. Let me know if there are any concerns with this approach.

I've made a few refactors to the code, please could I get another review pass?

sravfeyn · 2026-04-29T10:56:22Z

Since we know exactly which 4 visits should have been marked as overlimit, why not just do a simple update like below?

(Not exactly, but on the lines of)

  XFORM_IDS = ["<id1>", "<id2>"]

  visits = UserVisit.objects.filter(xform_id__in=XFORM_IDS).select_related("comp
  leted_work")
  user_ids = list(visits.values_list("user_id", flat=True))
  cw_ids = [v.completed_work_id for v in visits if v.completed_work_id]

  UserVisit.objects.filter(xform_id__in=XFORM_IDS).update(status=VisitValidation
  Status.over_limit, review_status="")
  CompletedWork.objects.filter(id__in=cw_ids).update(status=CompletedWorkStatus.
  over_limit)

  bulk_update_payment_accrued.delay(visits.first().opportunity_id, user_ids)

sravfeyn

Thanks. LGTM.

calellowitz

thanks, agree seems fine if only for the affected xforms/opps

implement script to remediate over cap visits

1a778a4

zandre-eng requested review from ajeety4, calellowitz, hemant10yadav, mkangia, pxwxnvermx and sravfeyn April 28, 2026 13:45

zandre-eng mentioned this pull request Apr 28, 2026

Preserve Over Limit Status #1151

Merged

8 tasks

filter out inactive and test opps

46721f8

calellowitz requested changes Apr 28, 2026

View reviewed changes

zandre-eng and others added 3 commits April 29, 2026 11:56

simplify script and add unit tests for verification

62bfb3d

sravfeyn reviewed Apr 29, 2026

View reviewed changes

Comment thread docs/plans/script.py Outdated

sravfeyn approved these changes Apr 29, 2026

View reviewed changes

call update sync

c36b843

calellowitz approved these changes Apr 30, 2026

View reviewed changes

sravfeyn approved these changes Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Script to Remediate Over Cap Visits#1152

Script to Remediate Over Cap Visits#1152
zandre-eng wants to merge 7 commits into
mainfrom
ze/remediate-over-cap-visits

zandre-eng commented Apr 28, 2026 •

edited

Loading

Uh oh!

mkangia commented Apr 28, 2026

Uh oh!

calellowitz left a comment

Uh oh!

calellowitz Apr 28, 2026

Uh oh!

zandre-eng Apr 29, 2026

Uh oh!

sravfeyn commented Apr 29, 2026

Uh oh!

zandre-eng commented Apr 29, 2026

Uh oh!

sravfeyn commented Apr 29, 2026

Uh oh!

Uh oh!

sravfeyn left a comment

Uh oh!

calellowitz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zandre-eng commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Product Description

Technical Summary

Safety Assurance

Safety story

Automated test coverage

QA Plan

Labels & Review

Uh oh!

mkangia commented Apr 28, 2026

Uh oh!

calellowitz left a comment

Choose a reason for hiding this comment

Uh oh!

calellowitz Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

zandre-eng Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

sravfeyn commented Apr 29, 2026

Uh oh!

zandre-eng commented Apr 29, 2026

Uh oh!

sravfeyn commented Apr 29, 2026

Uh oh!

Uh oh!

sravfeyn left a comment

Choose a reason for hiding this comment

Uh oh!

calellowitz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zandre-eng commented Apr 28, 2026 •

edited

Loading