fix(node): bound list_ref_certificates with LIMIT and add upsert to prevent unbounded growth (#147)#149
Conversation
Gitlawb#126) The IPFS visibility gate used withheld_blob_oids (a deny-set enumerating only reachable blobs), so a dangling/unreachable blob was absent from the set and served in cleartext to anonymous callers. Flip to an allowed-set (allowed_blob_set_for_caller) that enumerates reachable blobs the caller may read: a dangling blob has no path, is never in the set, and 404s.
Move store::read_object before the allowed_blob_set_for_caller spawn_blocking call so random-CID spray against repos with path-scoped rules cannot trigger full-history git walks on repos that don't carry the object.
…tion ✓ P3 blocker fixed: cargo fmt applied — the format gate will pass. ✓ P3 cleanup resolved: withheld_blob_oids is still used by replication code in repos.rs, so it stays. • P2 follow-up: Tree/commit disclosure tracked in Gitlawb#135 — out of scope here.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds bounded ref-certificate reads in storage and API handlers, enforces per-ref uniqueness in the database, returns a count from ChangesBounded ref certificate reads
Estimated code review effort: 4 (Complex) | ~45 minutes Possibly related PRs
Suggested labels: Suggested reviewers: 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Thanks for the contribution. A couple of things will help us review this faster:
See CONTRIBUTING.md. Update the PR and these notes will clear automatically. |
1f25cb8 to
579101e
Compare
579101e to
15fd462
Compare
There was a problem hiding this comment.
Pull request overview
This PR addresses an availability/security issue where anonymous reads could trigger unbounded ref_certificates DB fetches and large response bodies by adding DB-level limiting and constraining per-ref certificate growth.
Changes:
- Add
LIMITsupport tolist_ref_certificatesand thread limits through/eventsand/certs. - Add a
(repo_id, ref_name)uniqueness constraint and upsert behavior to prevent unbounded per-ref row growth. - Add tests covering the new
?limitbehavior and responsecountfield.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| crates/gitlawb-node/src/api/certs.rs | Adds ?limit handling and includes count in the /certs response. |
| crates/gitlawb-node/src/api/events.rs | Passes the request limit into the ref-certificate query to bound reads earlier. |
| crates/gitlawb-node/src/db/mod.rs | Adds unique index migration, upsert on (repo_id, ref_name), and LIMIT to the cert listing query. |
| crates/gitlawb-node/src/test_support.rs | Adds API-level tests verifying /certs limit behavior and count field. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let limit = params | ||
| .get("limit") | ||
| .and_then(|v| v.parse::<i64>().ok()) | ||
| .unwrap_or(50) | ||
| .min(200); |
| // Bound to the same limit as the overall response to avoid unbounded reads. | ||
| let cert_events: Vec<serde_json::Value> = if let Some(ref record) = repo_record { | ||
| state | ||
| .db | ||
| .list_ref_certificates(&record.id) | ||
| .list_ref_certificates(&record.id, limit) |
| stmts: &[ | ||
| "CREATE UNIQUE INDEX IF NOT EXISTS idx_ref_certs_repo_ref ON ref_certificates(repo_id, ref_name)", | ||
| ], |
| "SELECT id, repo_id, ref_name, old_sha, new_sha, pusher_did, node_did, signature, issued_at | ||
| FROM ref_certificates WHERE repo_id = $1 ORDER BY issued_at DESC", | ||
| FROM ref_certificates WHERE repo_id = $1 ORDER BY issued_at DESC LIMIT $2", | ||
| ) | ||
| .bind(repo_id) | ||
| .bind(limit) |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-node/src/api/certs.rs`:
- Around line 17-21: The `list_ref_certificates` limit handling in `certs.rs`
only caps the upper bound, so negative `limit` values can still reach the DB and
fail. Update the parsing in this handler to clamp `params.get("limit")` into a
valid range before calling `list_ref_certificates`, keeping the default at 50,
preserving the 200 maximum, and ensuring any negative or malformed input falls
back to a safe non-negative value.
In `@crates/gitlawb-node/src/db/mod.rs`:
- Around line 826-832: The ref_cert_unique_per_ref migration adds a unique index
on ref_certificates(repo_id, ref_name) but can fail if duplicate rows already
exist, so add a cleanup/backfill step in this Migration before the CREATE UNIQUE
INDEX statement. Use the migration’s existing stmts array to first deduplicate
or otherwise reconcile duplicate ref_certificates entries, then create the
unique index so the migration can succeed on existing databases.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e7053644-7fba-4793-aac2-aa3333db3fbd
📒 Files selected for processing (4)
crates/gitlawb-node/src/api/certs.rscrates/gitlawb-node/src/api/events.rscrates/gitlawb-node/src/db/mod.rscrates/gitlawb-node/src/test_support.rs
15fd462 to
5924b4b
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-node/src/test_support.rs`:
- Around line 1958-2121: The `/certs` endpoints are using `get_repo(...)`
directly, so `list_certs` and `get_cert` bypass the shared repo-read
authorization path. Update the handlers in `crate::api::certs` to call
`authorize_repo_read(...)` before fetching certificate data, using the same repo
lookup flow as other read endpoints. Add a test in `test_support.rs` that
creates a private repo and verifies anonymous access to `list_certs` or
`get_cert` is denied.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a76c6f30-7719-484a-933f-c7feb123d8ed
📒 Files selected for processing (4)
crates/gitlawb-node/src/api/certs.rscrates/gitlawb-node/src/api/events.rscrates/gitlawb-node/src/db/mod.rscrates/gitlawb-node/src/test_support.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- crates/gitlawb-node/src/api/events.rs
- crates/gitlawb-node/src/api/certs.rs
- crates/gitlawb-node/src/db/mod.rs
jatmn
left a comment
There was a problem hiding this comment.
I found a couple of issues that need to be addressed before this is ready.
Findings
-
[P2] Complete CodeRabbit's request to gate the cert endpoints
crates/gitlawb-node/src/api/certs.rs:24
CodeRabbit's current-head review item is still valid: bothlist_certsandget_certresolve the repo withget_repo(...)and never call the sharedauthorize_repo_read(...)path, while the routes sit on the anonymous read router. This means an unauthenticated caller who can name a private repo can still read its ref certificate metadata (ref_name, old/new SHAs, pusher DID, node DID, signature), andget_certcan return any certificate ID after only proving that the requested repo path exists. Please thread the optional authenticated caller through these handlers, useauthorize_repo_read(&state, &owner, &name, caller, "/"), and add the private-repo denial test CodeRabbit requested. -
[P2] Complete Copilot's limit-clamp request for the events path
crates/gitlawb-node/src/api/events.rs:54
The PR fixed negativelimithandling in/certs, butlist_repo_eventsstill parses the query withunwrap_or(50).min(200), so?limit=-1remains negative. This PR now passes that value intolist_ref_certificates(&record.id, limit), and the DB helper binds it directly intoLIMIT $2; malformed input therefore trips the DB query and the handler silently falls back to an empty local-cert/event result instead of applying the same bounded limit as normal requests. Please clamp the lower bound in the events path and add the DB-boundary clamp/test requested by the earlier reviewer comment so every caller oflist_ref_certificatesstays bounded.
b000ca9 to
52a830d
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/gitlawb-node/src/api/certs.rs (1)
60-68: 🔒 Security & Privacy | 🔴 Critical | 🏗️ Heavy liftBind
get_certto the authorized repo
get_ref_certificate(&id)is a global lookup, so this handler can return a certificate from a different repo than the one in the path. Checkcert.repo_idagainst the authorized repo before responding, and map mismatches toNotFoundto avoid leaking existence.🛡️ Proposed fix
let caller = auth.as_ref().map(|e| e.0 .0.as_str()); - let (_record, _rules) = + let (record, _rules) = crate::api::authorize_repo_read(&state, &owner, &name, caller, "/").await?; let cert = state .db .get_ref_certificate(&id) .await? + .filter(|c| c.repo_id == record.id) .ok_or_else(|| AppError::NotFound(format!("certificate {id}")))?;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/api/certs.rs` around lines 60 - 68, The get_cert handler currently uses a global get_ref_certificate lookup, so it can return a certificate from a different repository than the authorized path. In crates/gitlawb-node/src/api/certs.rs, after authorize_repo_read and before responding, verify the fetched cert’s repo_id matches the authorized repo for owner/name, and convert any mismatch into AppError::NotFound so the repo binding stays enforced without leaking existence.Source: Learnings
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/gitlawb-node/src/api/certs.rs`:
- Around line 60-68: The get_cert handler currently uses a global
get_ref_certificate lookup, so it can return a certificate from a different
repository than the authorized path. In crates/gitlawb-node/src/api/certs.rs,
after authorize_repo_read and before responding, verify the fetched cert’s
repo_id matches the authorized repo for owner/name, and convert any mismatch
into AppError::NotFound so the repo binding stays enforced without leaking
existence.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 51e4152d-a939-4710-8973-72ee665d977c
📒 Files selected for processing (5)
crates/gitlawb-node/src/api/certs.rscrates/gitlawb-node/src/api/events.rscrates/gitlawb-node/src/api/mod.rscrates/gitlawb-node/src/db/mod.rscrates/gitlawb-node/src/test_support.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- crates/gitlawb-node/src/api/events.rs
- crates/gitlawb-node/src/db/mod.rs
- crates/gitlawb-node/src/test_support.rs
52a830d to
13c19cf
Compare
jatmn
left a comment
There was a problem hiding this comment.
Thanks for the update. I rechecked the changed paths and found one item that still needs to be addressed.
Findings
- [P2] Complete Copilot's DB-boundary limit clamp request
crates/gitlawb-node/src/db/mod.rs:1965
The API handlers now clamp their query parameters before callinglist_ref_certificates, but the DB helper itself still accepts anyi64and binds it directly intoLIMIT $2. Copilot's current thread on this helper is still valid, and the previous review also asked for the DB-boundary clamp/test so every caller of this helper remains bounded. Please clamp or reject invalid limits insidelist_ref_certificatesitself, and add a focused test for a negative/raw helper limit so a future caller cannot reintroduce this path.
13c19cf to
88cb60f
Compare
jatmn
left a comment
There was a problem hiding this comment.
Thanks for the update. I rechecked the changed paths and found one item that still needs to be addressed.
Findings
- [P2] Gate the repo events certificate path as well
crates/gitlawb-node/src/api/events.rs:63
This update correctly moves/certsthroughauthorize_repo_read, but the same local certificate data is still reachable throughGET /api/v1/repos/{owner}/{repo}/events. That route sits on the optional-signature read router,list_repo_eventsresolves the repo withget_repo(...)instead of a caller-aware visibility check, and then the local-cert branch callslist_ref_certificates(&record.id, limit)and returnsref_name,old_sha,new_sha,pusher_did, andnode_did. For a private repo, an unauthenticated caller who can name the repo therefore still gets the ref-certificate metadata this PR just gated on/certs. Please thread the optional authenticated caller intolist_repo_eventsand gate the local repo branch withauthorize_repo_read(..., "/")before fetching or returning local cert/gossip events.
88cb60f to
f3e4d88
Compare
jatmn
left a comment
There was a problem hiding this comment.
Thanks for the update. I rechecked the changed paths and found one item that still needs to be addressed.
Findings
- [P2] Finish gating the repo-events gossip path after auth denial
crates/gitlawb-node/src/api/events.rs:80
The current update gates the local certificate query, but whenauthorize_repo_read(...)returns an error for this repo the handler falls back toformat!("{owner}/{repo_name}")and still callslist_repo_ref_updates(&repo_id_str, limit). If that URL slug matches rows inreceived_ref_updatesfor a local repo the caller cannot read,GET /api/v1/repos/{owner}/{repo}/eventsstill returns the repo's ref metadata (ref_name,old_sha,new_sha,pusher_did,node_did,cert_id, timestamps) even though the repo-read gate denied access. Please complete the previous repo-events gating request by distinguishing "repo not local" from "local repo exists but caller is unauthorized", and avoid fetching either local certs or local/gossip event rows after an authorization denial for a local repo.
beardthelion
left a comment
There was a problem hiding this comment.
Verified against the current head (f3e4d88). The certificate gating, the DB-boundary limit clamp, and the v10 migration structure all hold up. The blocking item is the one jatmn already flagged, still present on this head; a few smaller items follow.
Findings
-
[P2] Withhold the gossip branch of
list_repo_eventsafter an authorization denial
crates/gitlawb-node/src/api/events.rs:80
Seconding jatmn's open finding, still live on f3e4d88. The local certificate query is gated, but on theauthorize_repo_readerror branch the handler falls back toformat!("{owner}/{repo_name}")and still callslist_repo_ref_updates, a rawWHERE repo = $1read with no visibility filter. Driving the endpoint as an anonymous caller against a private local repo whose slug has areceived_ref_updatesrow returns that row'sref_name,old_sha,new_sha,pusher_did, andnode_did. The row has to be present first (peer-injected via gossip ornotify_sync, since a private repo does not announce its own pushes), which keeps this at P2, but the endpoint should distinguish "repo not local" from "local repo exists but caller denied" and skip both the cert and gossip reads once a local repo denies the caller. -
[P3] Keep the certificate id stable across re-pushes
crates/gitlawb-node/src/db/mod.rs:1932
insert_ref_certificatenow upserts withON CONFLICT (repo_id, ref_name) DO UPDATE SET id = EXCLUDED.id, so every re-push to a ref rotates the row's primary key.get_ref_certificatelooks up by id alone, so a cert id issued earlier (returned byissue_ref_certificate, logged, and carried asreceived_ref_updates.cert_id) stops resolving throughGET /certs/{id}after the next push. Note that just droppingid = EXCLUDED.idis not enough on its own:issue_ref_certificatereturns the freshly minted UUID to its caller, so the returned id would then diverge from the persisted one. Return the persisted row (RETURNING id) or derive the id deterministically from(repo_id, ref_name). -
[P3] Add a deny-path test for
list_repo_eventsand an upgrade-path test for the v10 dedup
crates/gitlawb-node/src/api/events.rs:51,crates/gitlawb-node/src/db/mod.rs:826
The new tests are all happy-path. Nothing drives an unauthorized caller against a private local repo and asserts the gossip rows are withheld, which is exactly where the leak above sits. Separately, the v10 dedupDELETEonly matters on already-deployed nodes that accumulated duplicates, but every new test runs on a fresh#[sqlx::test]database, so that branch never executes; a test that seeds duplicates at the prior schema version and then migrates would cover it. While there, give the dedupORDER BY issued_at DESCanidtiebreaker so the surviving row is deterministic when two share a timestamp.
The CREATE UNIQUE INDEX is non-concurrent and runs inside the migration transaction, so it briefly blocks ref_certificates writes during deploy. That is fine while the table stays small per ref, but worth a line in the deploy notes.
f3e4d88 to
b19583c
Compare
…revent unbounded growth (Gitlawb#147)
b19583c to
2460b1b
Compare
jatmn
left a comment
There was a problem hiding this comment.
Thanks for the update. I rechecked the previously discussed paths and do not see any remaining actionable issues from my side.
@kevincodex1 LGTM
Superseded by a re-review of the current head (2460b1b); the findings in this review are resolved.
beardthelion
left a comment
There was a problem hiding this comment.
Re-reviewed the current head (2460b1b). Every item from my previous review and jatmn's blocking finding is resolved, and I verified each fix by execution: the repo-events gossip path now returns 404 to a non-owner before any cert or gossip read (I drove the full, bare, and truncated owner forms, so the deny arm's get_repo recheck can't be sidestepped by an alternate owner slug); the negative and oversized ?limit= cases return a clamped 200 rather than a 500; the upsert preserves the cert id across re-pushes; and the v10 dedup keeps a deterministic survivor. The core is sound. Two things before it can merge.
Findings
-
[P1] Rebase onto current main and re-review the resolved ref-update diff before merge
crates/gitlawb-node/src/api/events.rs,crates/gitlawb-node/src/db/mod.rs
This branch's merge-base predates #143, which gated the ref-update feeds (the GraphQLref_updatesquery, the global/api/v1/events/ref-updatesfeed, and the subscription broadcast). #149 also editsevents.rsanddb/mod.rs, so the rebase will conflict in exactly those files, and a wrong resolution can silently drop #143'scollect_visible_ref_updatesgating and reopen the private-repo ref-metadata leak. Please rebase onto current main and re-review the resolved diff acrossevents.rs,graphql/query.rs,repos.rs, and everyreceived_ref_updatesreader before merging. -
[P2] Add the INV-7 upgrade-path test for the v10 dedup
crates/gitlawb-node/src/db/mod.rs
The dedupDELETEruns before theCREATE UNIQUE INDEXin one migration transaction andv10_dedup_removes_old_duplicatespasses, so the migration itself is sound. But that test runsrun_migrations()on a fresh database and then hand-copies the dedup SQL as string literals, so the real migration's dedup branch never runs on a seeded prior-version node and the test can drift fromMIGRATIONS[v10]without failing. Add a test that seedsschema_migrationsat v9 with duplicateref_certificates(including a pair sharingissued_at), runsDb::run_migrations(), and asserts one deterministic survivor per ref plus the index present.
Summary
Bound
list_ref_certificateswithLIMITat the DB layer and add a(repo_id, ref_name)upsert so the table cannot grow without bound per ref.Motivation & context
Closes #147
list_ref_certificatesranSELECT ... WHERE repo_id = $1withfetch_alland noLIMIT, loading every certificate row for a repo into memory. An anonymous caller on a public repo could amplify a single GET into an unbounded DB fetch, allocation, and response body.Kind of change
What changed
gitlawb-node—db/mod.rs: addedLIMIT $2tolist_ref_certificatesquery; changedinsert_ref_certificateto upsert on(repo_id, ref_name)conflict (v10 migration creates the unique index)gitlawb-node—api/certs.rs: added?limitquery param (default 50, max 200) andcountin responsegitlawb-node—api/events.rs: pass the existinglimittolist_ref_certificatesso the cert half is bounded beforefetch_allHow a reviewer can verify
cargo test -p gitlawb-node migration_testsBefore you request review
cargo clippy --workspace --all-targets -- -D warningsis cleanfix(...))Summary by CodeRabbit
limitquery support for certificate listings (default 50, clamped to1–200).countfield.limitof1, and related certificate lookups follow the same bound.limit, ordering,count, and private-repo access.