Add max capacity for node#157
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
bin/orbis-node/src/info/service.rs (1)
109-114: ⚡ Quick winHarden
managed_ring_countagainst malformed RingIndex payloads.On Line 109, decode failure currently bubbles up and fails
GetNodeInfo. For a diagnostic endpoint, returning0with a warning is usually safer than hard-failing the RPC.Proposed change
+use tracing::warn; @@ fn managed_ring_count(local_storage: &LocalStorageImpl) -> Result<u32, Status> { @@ - let ring_index: Vec<RingIndexEntry> = serde_json::from_slice(&bytes) - .map_err(|e| InfoError::InfoError(format!("Error parsing RingIndex: {}", e)))?; + let ring_index: Vec<RingIndexEntry> = match serde_json::from_slice(&bytes) { + Ok(v) => v, + Err(e) => { + warn!("Error parsing RingIndex: {}. Reporting managed_ring_count=0", e); + return Ok(0); + } + };🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bin/orbis-node/src/info/service.rs` around lines 109 - 114, The current decoding of the RingIndex in the managed_ring_count function can cause the entire GetNodeInfo to fail on JSON parse errors; instead, catch errors from serde_json::from_slice and return 0 with a logged warning, ensuring the diagnostic endpoint remains operational. Modify the error handling in the portion where ring_index is deserialized to safely handle parse failures by returning 0 rather than propagating the error.bin/orbis-node/src/dkg/coordinator/phases/phase4.rs (1)
121-125: ⚡ Quick winConsider moving the capacity preflight before the expensive DKG computation for the
Reshare+Receivercase.For
SessionKind::Fresh,storage_keydepends on the freshly computedaggregate_pk, so the preflight must come after the crypto work — this is unavoidable. However, forSessionKind::ResharewithDkgRole::Receiver,storage_keyis already determined fromkind.ring_key()(available in session state before any computation). A node at capacity that participates in many reshare-receiver ceremonies will perform the full share/polynomial computation before hitting the cap rejection at line 124.The most efficient fix for the at-capacity reshape-receiver path would be to pull the preflight up before the
with_statecrypto block for that branch. Longer term, considering capacity checks at session initiation (Phase 1 orSessionInit) would prevent any wasted crypto work.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bin/orbis-node/src/dkg/coordinator/phases/phase4.rs` around lines 121 - 125, For the Reshare+Receiver path, move the capacity preflight earlier so it runs before the expensive crypto block: detect if matches!(kind, SessionKind::Reshare { .. }) && dkg_role == DkgRole::Receiver, compute storage_key from kind.ring_key() (available in session state) and call ring_storage::preflight_new_ring_capacity(&coord.app_state, &storage_key).await? before entering the with_state/crypto computation; keep the existing post-crypto preflight for the SessionKind::Fresh case (which needs storage_key derived from aggregate_pk) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@bin/orbis-node/src/dkg/coordinator/phases/phase4.rs`:
- Around line 121-125: For the Reshare+Receiver path, move the capacity
preflight earlier so it runs before the expensive crypto block: detect if
matches!(kind, SessionKind::Reshare { .. }) && dkg_role == DkgRole::Receiver,
compute storage_key from kind.ring_key() (available in session state) and call
ring_storage::preflight_new_ring_capacity(&coord.app_state, &storage_key).await?
before entering the with_state/crypto computation; keep the existing post-crypto
preflight for the SessionKind::Fresh case (which needs storage_key derived from
aggregate_pk) unchanged.
In `@bin/orbis-node/src/info/service.rs`:
- Around line 109-114: The current decoding of the RingIndex in the
managed_ring_count function can cause the entire GetNodeInfo to fail on JSON
parse errors; instead, catch errors from serde_json::from_slice and return 0
with a logged warning, ensuring the diagnostic endpoint remains operational.
Modify the error handling in the portion where ring_index is deserialized to
safely handle parse failures by returning 0 rather than propagating the error.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8a32e6ac-3d1b-42fa-9fed-aeb0726c0f06
📒 Files selected for processing (11)
bin/cli-tool/src/commands.rsbin/orbis-node/src/constants.rsbin/orbis-node/src/dkg/coordinator/phases/mod.rsbin/orbis-node/src/dkg/coordinator/phases/phase4.rsbin/orbis-node/src/dkg/coordinator/ring_storage.rsbin/orbis-node/src/dkg/error.rsbin/orbis-node/src/info/service.rsbin/orbis-node/src/info/tests.rsbin/orbis-node/src/tests/node.rscrates/proto/README.mdcrates/proto/proto/info_service.proto
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-03-03T15:27:28.357Z
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 87
File: bin/orbis-node/src/tests/mod.rs:3-3
Timestamp: 2026-03-03T15:27:28.357Z
Learning: Ensure integration tests under bin/orbis-node/src/tests (including Docker-based tests in bin/orbis-node/src/tests/integration.rs) run by default with cargo test (no feature flags). Remove or guard against #[cfg(feature = "integration")] or similar gating so tests execute under default features. If needed, adjust CI to run cargo test without --features and document that these tests are part of the default test suite.
Applied to files:
bin/orbis-node/src/tests/node.rs
🔇 Additional comments (12)
bin/orbis-node/src/dkg/coordinator/phases/mod.rs (1)
14-14: Looks good for phase-module dependency alignment.This import addition is consistent with the ring-index capacity work landing across coordinator/info paths.
crates/proto/proto/info_service.proto (1)
27-28: Proto field addition is backward-compatible.Adding
managed_ring_countas a new tag (= 5) is the right shape for non-breaking evolution.bin/orbis-node/src/constants.rs (1)
50-55: Good addition: explicit cap for local ring growth.The constant and docs clearly encode the operational bound introduced by this PR.
crates/proto/README.md (1)
25-25: Docs update is aligned with the API change.
GetNodeInfodocumentation now correctly reflects the newmanaged_ring_countfield.bin/orbis-node/src/dkg/error.rs (1)
55-57: Error modeling and status mapping are well done.Using a dedicated variant plus
ResourceExhaustedgives callers clear, actionable failure semantics when node-local ring capacity is hit.Also applies to: 109-111
bin/orbis-node/src/info/service.rs (1)
84-98: Managed ring count wiring intoGetNodeInfoResponselooks correct.The field is computed and surfaced cleanly in the node-info response path.
bin/orbis-node/src/info/tests.rs (1)
13-99: Nice coverage for the new node-info field.Both zero-state and populated RingIndex scenarios are now asserted, which gives good confidence in
managed_ring_countbehavior.bin/cli-tool/src/commands.rs (1)
904-905: CLI propagation of managed ring count is clean and complete.Field mapping and user-facing output are both updated consistently.
Also applies to: 926-933, 942-943
bin/orbis-node/src/tests/node.rs (1)
187-187: LGTM —managed_ring_count == 0assertions are correct for both phases.No DKG has completed in either test scenario, so asserting
0is the right expectation for both the bootstrap-only (Bootstrapping) and full-server (Ready) states.Also applies to: 302-302
bin/orbis-node/src/dkg/coordinator/ring_storage.rs (2)
102-112: LGTM — preflight is advisory;add_ring_index_entryholds the definitive lock-protected check.The TOCTOU window between
preflight_new_ring_capacityreleasingring_index_lockandadd_ring_index_entryre-acquiring it is safe:add_ring_index_entryindependently callsensure_local_ring_capacityunder its own lock acquisition, so no ring can be over-admitted by a concurrent session slipping through between these two calls.
228-255: LGTM — clean fail-closed design.
read_ring_indexcorrectly distinguishesOk(None)(key absent → empty index) fromErr(storage failure → propagate), andensure_local_ring_capacitycorrectly short-circuits on an existing entry before applying the cap, making the preflight safe for upsert scenarios.bin/orbis-node/src/dkg/coordinator/phases/phase4.rs (1)
156-166: LGTM — cleanup logic is correct for all role/kind combinations.
adds_new_local_ringisfalseforDealerReceiver(existing bundle updated, not created) andtrueforFresh/Receiver(new bundle just written), so the conditional cleanup incleanup_new_ring_bundle_after_index_failureis appropriately scoped. The original index-write error is correctly returned regardless of whether the cleanup deletion succeeds.Also applies to: 195-205, 299-315
Closes #101