feat(account-link): Phase 2 — instance metering + daily usage sync#6839
feat(account-link): Phase 2 — instance metering + daily usage sync#6839ConnorYoh wants to merge 22 commits into
Conversation
…red, no behaviour change) Phase 2, PR #1 (slice 1/3). Move the pure doc-unit calculation out of the SaaS DefaultDocumentClassifier into a new stirling.software.proprietary.billing package (UnitCalcPolicy value object + DocumentUnitCalculator) so a linked self-hosted instance can cost operations with identical logic. :saas depends on :proprietary so the SaaS classifier delegates; the community core build (which excludes :proprietary) never ships EE billing logic. PricingPolicy (JPA) + the jpdfium/IO inspection stay in :saas. Behaviour-preserving — existing DefaultDocumentClassifier{,More}Test pass.
…e covers API-key calls Phase 2 (foundation). Add proprietary.billing.BillingCategory + BillingCategoryClassifier (pure AUTOMATION → AI → API → BYPASSED precedence, shared by the instance gate + upcoming meter). BillableOperationClassifier now returns a BillingCategory via categorize(request, apiKey) instead of a coarse boolean, and InstanceEntitlementInterceptor resolves the API-key principal and blocks any non-BYPASSED category — closing a gap where plain API-key tool calls were neither gated nor counted (the feature meters API/Automation/AI). Deliberately did NOT move the SaaS payg.model.BillingCategory enum into :proprietary: that would churn ~19 in-cloud hot-path files for what is JSON-string metadata on the sync wire. The instance reports category names; SaaS maps them. Flag-gated, @Profile("!saas"). +tests.
…ng period Phase 2, step 1a. The SaaS GET /api/v1/instance/entitlement response now includes the team's UnitCalcPolicy (doc-unit knobs) and period start/end, so a linked instance can compute units locally (via the shared DocumentUnitCalculator) and key its per-period cumulative counters on the [periodStart, periodEnd) boundary. InstanceEntitlement gains the three metering fields plus a 5-arg gate-only constructor, so the revoked sentinel + gate tests are unchanged. AccountLinkClient parses them tolerantly: a missing or invalid policy/period degrades to null rather than failing the whole entitlement parse. Flag-gated both sides. +tests (SaaS emits, asserted).
…store Phase 2, step 1b. Add the dedicated billing switch `stirling.billing.account-link.metering.enabled` — separate from the link master flag so linking can be enabled (e.g. to test it) without ever turning on real metering / billing — plus `metering.sync-interval-hours` (24) and `metering.grace-days` (3). Add the per-(period, category) UsageCounter entity (auto-created by Hibernate on self-hosted; inert empty table until metering writes it), its repository with an atomic SQL increment, and UsageMeterService.accrue(): race-safe increment-or-insert, gated on metering.enabled. The cumulative model is idempotent + tamper-evident for the daily sync. +tests. Wiring this into the gate interceptor's success path (compute units, accrue) is the next slice.
… counter Phase 2, step 1c. On afterCompletion (success only), InstanceEntitlementInterceptor computes doc-units for the request via the shared DocumentUnitCalculator + the synced UnitCalcPolicy and accrues them to the per-period UsageCounter for the request's BillingCategory. The category is classified once in preHandle (for the gate) and reused. Metering is gated behind metering.enabled via ObjectProvider<UsageMeterService> — absent ⇒ no accrual, gate still works — and skipped until the entitlement carries a policy + period. KNOWN GAP (pre-flag-on): units are the bytes axis only; PDF page-counting (materialising uploads + jpdfium) is a follow-up, so page-heavy small PDFs currently under-count vs SaaS. +tests.
4921049 to
7eea0b1
Compare
…lta via chargeStandalone The instance reports monotonic cumulative units per BillingCategory each day; SaaS bills only the delta since the last sync. Cumulative+seq make it idempotent (resend → delta 0) and tamper-evident (a backwards total is refused, not credited). Reuses JobChargeService.chargeStandalone for the money path (free-grant split, wallet_ledger DEBIT, Stripe meter, idempotency) — no separate billing logic. - V25 payg_instance_usage: last-seen cumulative + sync_seq per (team, period, category) - PaygInstanceUsage entity + repository - InstanceUsageIngestService: delta/replay/regression/null-actor handling - InstanceController POST /sync: resolves actor from linked_instance.created_by_user_id, ingests, returns fresh entitlement (one round-trip reports + refreshes the gate) - JobSource.LINKED_INSTANCE + ReferenceType.INSTANCE_SYNC
…o SaaS Counterpart to the SaaS ingest: a flag-gated @scheduled sender on the self-hosted side reports each period's cumulative per-category units to POST /api/v1/instance/sync and adopts the refreshed entitlement from the reply. Resilience by design: - sync seq reserved + persisted BEFORE the report (strictly monotonic across restarts/failures; SaaS dedups replays on it) - transport failure leaves lastSyncedUnits markers untouched → usage rolls into the next sync; the burned seq is harmless - reports ALL periods with unsynced usage, not just the current one, so end-of-period usage isn't stranded on period rollover - revoked credential aborts reporting; the entitlement cache blocks on its own refresh - UsageCounter.lastSyncedUnits (+ markSynced / findPeriodsWithUnsyncedUsage) - AccountLinkSyncState singleton (seq + lastSuccessAt) + repository - AccountLinkClient.reportUsage (mirrors fetchEntitlement outcomes) - EntitlementCache.accept (seed cache from the sync reply, no redundant fetch) - UsageSyncService scheduler (interval from metering.sync-interval-hours)
…synced Backend: GET /api/v1/account-link/usage (.local, admin-only) exposes this instance's locally-accrued but not-yet-synced usage per category for the current period (cumulative − lastSyncedUnits, floored at 0; scoped to the entitlement's period so prior-period leftovers don't inflate it). LocalUsageService reads the counters; zeros when metering is off or the period is unknown. FE: the portal fetches it alongside the wallet (best-effort) and folds it into the "PDFs processed this period" card — headline + category split show synced + unsynced as one current-usage figure, with a "+N pending sync" note. Spend/cap cards stay on the Stripe-authoritative figures. - proprietary: LocalUsageService + GET /usage + UsageCounter 5-arg ctor + tests - portal: fetchLocalUsage + LocalUsage type + MSW handler/fixture + PdfsProcessedCard combine + story + link.test + en-US pendingSync copy
…ly bound Page-counting (proprietary): the metering interceptor now counts PDF pages (PDFBox, bounded to 50MB, malformed/encrypted → bytes-only fallback) instead of the bytes axis alone. The instance is authoritative for units — SaaS bills the delta of what we report and never sees the file — so a page-heavy but small PDF was under-billed before. Test builds a real 5-page PDF and asserts 5 units, not 1. Anomaly bound (SaaS ingest): a per-category, per-sync ceiling (max-units-per-sync, default 100000) refuses an implausible delta — likely a runaway instance bug — rather than silently over-charging; it's logged and not advanced so a corrected resend can reconcile. Complements the existing regression-refusal and monotonic-seq replay guard. (HMAC payload signing intentionally NOT added — see follow-up note: with the device secret riding in the same request over TLS it adds nothing; real payload integrity needs sign-don't-transmit, e.g. Ed25519, a separate auth-model change.)
Drop the max-units-per-sync anomaly bound (the right limit is the customer's cap, enforced at the gate — an arbitrary per-sync ceiling both stranded the usage permanently and didn't actually stop a catastrophic bill). Removing it fixes the stranding bug the review found: SaaS refused an over-bound delta without advancing, but /sync still returned 200 so the instance marked it synced and never re-reported → usage lost forever. Review fixes: - Drop anomaly bound; document that cap enforcement is the gate's job, not the ingest's (mirrors in-cloud EntitlementGuard vs JobChargeService), + intent test - Pessimistic row lock on the ingest read-modify-write so a duplicate sync delivery can't double-charge the delta (mirrors the free-grant findByIdForUpdate) - Validate the request's periodStart against the authoritative snapshot period (reject a fabricated value that would reset the dedup partition) + reject test - Bind the sync interval in code via SchedulingConfigurer instead of a @scheduled SpEL string (was only evaluated at flags-on boot, zero CI coverage) + interval test - columnDefinition default 0 on the durable instance counters so ddl-auto ADD COLUMN is safe on a populated external Postgres - Document minChargeUnits per-sync-delta semantics (vs per-op in-cloud) - V25: add the "-- Twin of" Supabase header (twin file lands in the SaaS repo)
…bjectMapper body Addresses three findings from the external PR review (all behind the off-by-default flags): - F1 (MED): wire the grace window the config always promised but nothing read. InstanceEntitlementGate now blocks (GRACE_EXPIRED) instead of failing open forever when linked + metering on + SaaS unreachable past graceDays. Reference is the persisted last-successful sync (survives restart), falling back to linkedAt for a never-synced instance; metering off or graceDays<=0 disables the backstop (so it stays inert in release where the whole model is off). This makes the "the gate is the backstop" rationale (used to drop the anomaly bound) real. - F4 (LOW): add a chargeStandalone minChargeUnits>1 regression test pinning the per-sync-delta floor. (The review's "over-charge" framing is inverted — per-delta flooring is always <= per-op, i.e. under-bills or equals, never over-charges.) - F5d (nit): build the /sync body with ObjectMapper (ObjectNode) not string concat.
…lineage dedup F2 (MED): the instance now counts PDF pages with jpdfium (parser-identical to the SaaS classifier) instead of PDFBox, so the page axis can't diverge between instance and cloud on encrypted/malformed PDFs. Inputs are materialised once to a temp file and read for both the page count and the content hash; streaming materialisation also retires the 50MB heap-bound page-count cap and the consumed-getBytes risk. F3 (MED): lineage dedup now runs on the instance (reusing SaaS's hashing, adapted to the local DB), so a re-submitted identical input set isn't re-charged — matching the in-cloud lineage join so the same op costs the same wherever it runs. - ContentHasher (shared, :proprietary): the SHA-256 algorithm; SaaS ByteHashSignatureExtractor now delegates to it (single-sourced, no drift). - MeteredInputSignature + repo: per-(period, input-set signature) local dedup store. - UsageMeterService.accrue(..., opSignature): claims the signature (insert-as-claim, race-safe) before incrementing; a claimed signature skips re-accrual. Claim-first means the rare failure is a missed accrual (customer-favourable), never a double-charge. - Interceptor builds the order-independent op signature from sorted per-file hashes; falls back to no-dedup if any input can't be hashed (never a wrong match).
… + metering debug logs - Portal: remove the "+N processed locally, pending sync" note — the customer doesn't need the synced-vs-pending distinction; the card still shows the combined current-usage total. Dropped the now-unused i18n keys. - New admin endpoint POST /api/v1/account-link/sync-now (.local, ObjectProvider- gated) forces an immediate usage sync — an ops "reconcile now" action + test aid so you don't wait on the scheduler. 204 on run, 409 when metering is off. - Debug logging on the metering path (interceptor + dedup claim) so a single re-run shows category / multipart / hashed-files / op-signature / dedup hit-or-miss — to diagnose why an op did or didn't dedup.
…check (V26) The instance daily-sync charge writes a payg_shadow_charge row with job_source=LINKED_INSTANCE via chargeStandalone, but that table's job_source CHECK constraint (added Supabase-side; the main-repo V16 added the column with no check) predates the value → the insert failed the check and 500'd POST /api/v1/instance/sync. - V26 widens the constraint to include LINKED_INSTANCE (idempotent DROP IF EXISTS + ADD, additive superset of the JobSource enum). Supabase twin owed in the SaaS repo. - Drop the unused ReferenceType.INSTANCE_SYNC — the ledger writes ReferenceType.JOB, so it was dead and would be the same constraint landmine if ever written.
… gate (account-link Mode A) Tighten the combined-billing "Mode A" loop so a linked instance reflects billing changes promptly instead of lagging cache TTLs, and enforce the free grant locally in real time. Instance (proprietary): - Gate depletes the free grant by locally-accrued unsynced usage, blocking AT the grant in real time instead of overshooting until the next sync charges the backlog (InstanceEntitlementGate + tests). - Idle manual sync now forces an entitlement refresh so a just-subscribed instance unblocks immediately rather than waiting out the ~5-min cache TTL (UsageSyncService). - Strip verbose metering debug logs; keep the error-path logs. SaaS: - POST /instance/sync and GET /instance/entitlement invalidate the team snapshot so the instance sees fresh subscription/spend at once; new POST /payg/wallet/refresh lets the portal drop its own cache post-checkout (InstanceController, PaygWalletController + tests). Portal: - Fold instance-local unsynced usage into the free trial meter (WalletMeter). - Checkout modal stays open through activation (finalize + poll), matches the SaaS plan modal width so Stripe renders its desktop layout, and requests redirect_on_completion:"never" so onComplete fires in-page (no reload); nudges the local instance to refresh on completion.
…ering # Conflicts: # frontend/portal/src/components/billing/StripeCheckoutModal.tsx # frontend/portal/src/components/billing/WalletMeter.tsx # frontend/portal/src/views/Usage.tsx
The manual conflict resolution left 3 files (FreePlanView, StripeCheckoutModal, Usage) with formatting Prettier disagreed with, failing frontend-validation's frontend:format:check. Pure formatting; no behaviour change. tsc + eslint clean.
Removing the old "finalizing" banner from Usage.tsx (the checkout modal owns that state now) left usage.finalizing.title/body unused, which fails the unused-translation guard in frontend:test:editor. The used billing.checkout.finalizing keys are untouched.
| title = "Activating your Processor plan…" | ||
| body = "Your payment went through. We're switching on metered processing across your linked instances — this usually takes a few seconds." |
There was a problem hiding this comment.
| title = "Activating your Processor plan…" | |
| body = "Your payment went through. We're switching on metered processing across your linked instances — this usually takes a few seconds." | |
| title = "Activating your Processor plan..." | |
| body = "Your payment went through. We're switching on metered processing across your linked instances - this usually takes a few seconds." |
Would be nice to fix the silly unicode chars in user-facing strings
There was a problem hiding this comment.
Good spot — done in f5603fb. Applied the same ASCII fix (… → ..., — → -) to the sibling activationSlow copy and the mirrored fallback strings in StripeCheckoutModal too, so it's consistent. Left the pre-existing displaySub em-dash alone since it's not part of this PR.
Replace the unicode ellipsis (…) and em-dash (—) in the finalizing + activationSlow strings with ... and - , in both the en-US toml and the mirrored fallbacks in StripeCheckoutModal. Keys/behaviour unchanged.
…ering # Conflicts: # frontend/editor/src/portal/components/billing/StripeCheckoutModal.tsx # frontend/editor/src/portal/views/Usage.tsx # frontend/portal/public/locales/en-US/translation.toml
jbrunton96
left a comment
There was a problem hiding this comment.
Looked through it and it all seems good to my eyes. I've tried to update it to fix the conflicts after #6857 went in, hopefully I've done it right.
I've also had Claude have a look at it and it came out with these, which I've no idea if they're correct. It does seem to think it won't double count, but might sometimes under-count:
Findings
Instance content-signature dedup spans the whole billing period, diverging from the cloud's 5-minute lineage window → under-billing — UsageMeterService.accrue / MeteredInputSignature
The instance dedup key is (period_start, signature) where signature is a content-only, category-agnostic hash of the input bytes, and the unique constraint has no time bound within the period. The cloud lineage join it claims parity with (DefaultHashLineageDetector → findOpenJobForSignatures) only matches open jobs within a 5-minute workflow window, and chargeStandalone closes its job immediately. So two independent operations on identical input bytes an hour (or three weeks) apart in the same month are charged twice in the cloud but once on the instance — and even a different-category operation on the same bytes is skipped. This contradicts the PR's stated goal ("the same op costs the same whether it runs on the instance or in the cloud") and under-bills. If period-wide content dedup is intended, the divergence from cloud semantics should at least be documented; if not, the dedup needs a comparable short window.Subscribed team with a period cap overshoots by up to a full sync interval — InstanceEntitlementGate.evaluate / entitled
evaluate() computes pendingUnsynced and subtracts it from the free balance only for unsubscribed teams (entitlement.map(e -> !e.subscribed())…, else 0). For a subscribed team with a periodCapUnits, entitled() checks periodSpendUnits() < periodCapUnits() against the last synced spend only. So a capped subscribed team keeps being allowed until the next daily sync reconciles — overshooting the cap by up to ~24h of local billable work. This is the exact real-time-depletion problem the PR solves for the grant, left unsolved for the cap, even though InstanceUsageIngestService explicitly relies on "the instance stops accruing at the cap" and calls the residual only a "~entitlement-cache TTL overshoot" (it's actually up to a sync interval). Consider subtracting pendingUnsynced from the cap check too.Category↔string mapping triplicated — LocalUsageService.currentPeriodUnsynced, UsageSyncService.syncPeriod, UsageSyncService.recordSuccess
The "API"/"AI"/"AUTOMATION" mapping appears as a switch in LocalUsageService, another switch in syncPeriod, and three if (x > 0) markSynced(...) calls in recordSuccess. BillingCategory already exists. Adding/renaming a category means editing three hardcoded-string sites that can silently drift (a typo'd string just falls into the ignored default). Iterate over BillingCategory values (excluding BYPASSED) instead.Hot-path metering reads each uploaded file from disk twice — InstanceEntitlementInterceptor.meterRequest
Every successful billable multipart request materializes each input to a temp file, then reads it fully twice: once via jpdfium for the page count and once via ContentHasher.sha256 for the signature. The SHA-256 can be computed in the single pass that already streams the upload to the temp file (DigestInputStream/DigestOutputStream around the transferTo), eliminating one full re-read per file on a path that runs on every AI/API/automation request. (Minor adjacent: UsageSyncService calls loadState() twice per period — once in reserveNextSeq, once in recordSuccess — a redundant singleton fetch that could be threaded through.)
| /** | ||
| * Ingests a linked self-hosted instance's daily usage sync (combined-billing "Mode A"). | ||
| * | ||
| * <p>The instance reports a <b>monotonic cumulative</b> unit total per {@link BillingCategory} for | ||
| * the current billing period. We bill only the <b>delta</b> since the last sync, which makes the | ||
| * model idempotent (a resend reports the same total → delta 0 → no charge) and tamper-evident (a | ||
| * total that goes backwards is refused, not credited; a monotonic {@code syncSeq} dedups replays). | ||
| * The charge itself <b>reuses {@link JobChargeService#chargeStandalone}</b> — the same free-grant | ||
| * split, {@code wallet_ledger} DEBIT, Stripe meter, and idempotency the in-cloud charge path uses — | ||
| * so there is no separate billing logic for this flow. | ||
| * | ||
| * <p><b>Cap enforcement is the request-time gate's job, not this charge path's</b> — exactly as the | ||
| * in-cloud path enforces the cap at {@code EntitlementGuard}, not in {@code JobChargeService}. The | ||
| * instance's own {@code InstanceEntitlementGate} blocks billable work once the team is over its cap | ||
| * (a $0 cap blocks everything metered), so usage stops accruing at the cap and the reported delta | ||
| * does not run past it. We deliberately do NOT re-check the cap here: a customer is never charged | ||
| * past a limit their gate already enforces, and the only residual is the bounded | ||
| * (~entitlement-cache TTL) overshoot inherent to any eventually-consistent meter. If the instance | ||
| * ever meters past the cap that is an instance bug to fix, not something this aggregate path should | ||
| * silently absorb. | ||
| * | ||
| * <p>{@code minChargeUnits} is applied by {@code chargeStandalone} <b>per sync-delta</b> here, | ||
| * which intentionally differs from the per-operation floor in-cloud: the cumulative-delta model | ||
| * carries no per-op identity, so a daily delta of D bills {@code max(D, minChargeUnits)} once, not | ||
| * per underlying op. With the shipped default ({@code minChargeUnits=1}) this is a no-op (the | ||
| * delta>0 guard already covers the only floored case). | ||
| * | ||
| * <p>Gated behind {@code stirling.billing.account-link.enabled}. | ||
| */ |
There was a problem hiding this comment.
Some of these comments are getting very very long
| > | ||
| {error} | ||
| </Banner> | ||
| {phase === "finalizing" && ( |
There was a problem hiding this comment.
Might be nice to split these into their own components?
… fixes - Trim verbose/AI-style comments across account-link metering files - Split checkout status views into CheckoutFinalizing/CheckoutActivationSlow - Dedup: 5-min workflow-window parity with cloud (last_metered_at + window prop) - Gate: deplete spend cap by local unsynced usage for capped subscriptions too - DRY per-category mapping via BillingCategory EnumMap - Meter hot path: hash during temp-file write (DigestOutputStream); load sync state once
🚀 V2 Auto-Deployment Complete!Your V2 PR with embedded architecture has been deployed! 🔗 Direct Test URL (non-SSL) http://54.175.155.236:6839 🔐 Secure HTTPS URL: https://6839.ssl.stirlingpdf.cloud This deployment will be automatically cleaned up when the PR is closed. 🔄 Auto-deployed for approved V2 contributors. |
Account-link Phase 2: metering + daily usage sync
Phase 1 (already on main) let a self-hosted instance link a SaaS account and blocked billable work when it was over its limit. It blocked, but it never actually charged anything. This PR adds the metering + billing half.
It's off by default. Everything sits behind
stirling.billing.account-link.metering.enabled, on top of the existingstirling.billing.account-link.enabledmaster flag. Both have to be on for any of it to run, so it can't touch production. The billing model isn't going live yet — this is a dark merge.How it works
What's worth a reviewer's eyes
POST /api/v1/instance/sync, migrations V25 (payg_instance_usage) and V26 (allow theLINKED_INSTANCEjob source), and a smallPOST /api/v1/payg/wallet/refreshthe portal calls after checkout.Companion PR
Stirling-PDF-SaaS #314 (on
v3): the checkout edge function so the embedded Stripe flow finishes in-page instead of reloading, plus aDeno.servemigration so the edge functions actually deploy.Testing
Java unit tests (proprietary + saas), portal vitest, and the SaaS edge-function tests all pass. Branch is merged up to date with main.
Not done yet (doesn't block this merge — only matters once both flags are on)