Skip to content

Add max capacity for node#157

Merged
JesseAbram merged 3 commits intodevelopfrom
Add-max-capacity-for-node
May 8, 2026
Merged

Add max capacity for node#157
JesseAbram merged 3 commits intodevelopfrom
Add-max-capacity-for-node

Conversation

@JesseAbram
Copy link
Copy Markdown
Collaborator

Closes #101

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Warning

Rate limit exceeded

@JesseAbram has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 22 minutes and 12 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c9c2b7c3-f8bb-4b82-a9b3-1bfc6ac56223

📥 Commits

Reviewing files that changed from the base of the PR and between d085dea and 7a8a090.

📒 Files selected for processing (1)
  • bin/orbis-node/src/dkg/coordinator/phases/phase4.rs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@JesseAbram JesseAbram marked this pull request as ready for review May 8, 2026 17:11
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
bin/orbis-node/src/info/service.rs (1)

109-114: ⚡ Quick win

Harden managed_ring_count against malformed RingIndex payloads.

On Line 109, decode failure currently bubbles up and fails GetNodeInfo. For a diagnostic endpoint, returning 0 with a warning is usually safer than hard-failing the RPC.

Proposed change
+use tracing::warn;
@@
 fn managed_ring_count(local_storage: &LocalStorageImpl) -> Result<u32, Status> {
@@
-    let ring_index: Vec<RingIndexEntry> = serde_json::from_slice(&bytes)
-        .map_err(|e| InfoError::InfoError(format!("Error parsing RingIndex: {}", e)))?;
+    let ring_index: Vec<RingIndexEntry> = match serde_json::from_slice(&bytes) {
+        Ok(v) => v,
+        Err(e) => {
+            warn!("Error parsing RingIndex: {}. Reporting managed_ring_count=0", e);
+            return Ok(0);
+        }
+    };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bin/orbis-node/src/info/service.rs` around lines 109 - 114, The current
decoding of the RingIndex in the managed_ring_count function can cause the
entire GetNodeInfo to fail on JSON parse errors; instead, catch errors from
serde_json::from_slice and return 0 with a logged warning, ensuring the
diagnostic endpoint remains operational. Modify the error handling in the
portion where ring_index is deserialized to safely handle parse failures by
returning 0 rather than propagating the error.
bin/orbis-node/src/dkg/coordinator/phases/phase4.rs (1)

121-125: ⚡ Quick win

Consider moving the capacity preflight before the expensive DKG computation for the Reshare+Receiver case.

For SessionKind::Fresh, storage_key depends on the freshly computed aggregate_pk, so the preflight must come after the crypto work — this is unavoidable. However, for SessionKind::Reshare with DkgRole::Receiver, storage_key is already determined from kind.ring_key() (available in session state before any computation). A node at capacity that participates in many reshare-receiver ceremonies will perform the full share/polynomial computation before hitting the cap rejection at line 124.

The most efficient fix for the at-capacity reshape-receiver path would be to pull the preflight up before the with_state crypto block for that branch. Longer term, considering capacity checks at session initiation (Phase 1 or SessionInit) would prevent any wasted crypto work.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bin/orbis-node/src/dkg/coordinator/phases/phase4.rs` around lines 121 - 125,
For the Reshare+Receiver path, move the capacity preflight earlier so it runs
before the expensive crypto block: detect if matches!(kind, SessionKind::Reshare
{ .. }) && dkg_role == DkgRole::Receiver, compute storage_key from
kind.ring_key() (available in session state) and call
ring_storage::preflight_new_ring_capacity(&coord.app_state, &storage_key).await?
before entering the with_state/crypto computation; keep the existing post-crypto
preflight for the SessionKind::Fresh case (which needs storage_key derived from
aggregate_pk) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@bin/orbis-node/src/dkg/coordinator/phases/phase4.rs`:
- Around line 121-125: For the Reshare+Receiver path, move the capacity
preflight earlier so it runs before the expensive crypto block: detect if
matches!(kind, SessionKind::Reshare { .. }) && dkg_role == DkgRole::Receiver,
compute storage_key from kind.ring_key() (available in session state) and call
ring_storage::preflight_new_ring_capacity(&coord.app_state, &storage_key).await?
before entering the with_state/crypto computation; keep the existing post-crypto
preflight for the SessionKind::Fresh case (which needs storage_key derived from
aggregate_pk) unchanged.

In `@bin/orbis-node/src/info/service.rs`:
- Around line 109-114: The current decoding of the RingIndex in the
managed_ring_count function can cause the entire GetNodeInfo to fail on JSON
parse errors; instead, catch errors from serde_json::from_slice and return 0
with a logged warning, ensuring the diagnostic endpoint remains operational.
Modify the error handling in the portion where ring_index is deserialized to
safely handle parse failures by returning 0 rather than propagating the error.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8a32e6ac-3d1b-42fa-9fed-aeb0726c0f06

📥 Commits

Reviewing files that changed from the base of the PR and between f099749 and d085dea.

📒 Files selected for processing (11)
  • bin/cli-tool/src/commands.rs
  • bin/orbis-node/src/constants.rs
  • bin/orbis-node/src/dkg/coordinator/phases/mod.rs
  • bin/orbis-node/src/dkg/coordinator/phases/phase4.rs
  • bin/orbis-node/src/dkg/coordinator/ring_storage.rs
  • bin/orbis-node/src/dkg/error.rs
  • bin/orbis-node/src/info/service.rs
  • bin/orbis-node/src/info/tests.rs
  • bin/orbis-node/src/tests/node.rs
  • crates/proto/README.md
  • crates/proto/proto/info_service.proto
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-03-03T15:27:28.357Z
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 87
File: bin/orbis-node/src/tests/mod.rs:3-3
Timestamp: 2026-03-03T15:27:28.357Z
Learning: Ensure integration tests under bin/orbis-node/src/tests (including Docker-based tests in bin/orbis-node/src/tests/integration.rs) run by default with cargo test (no feature flags). Remove or guard against #[cfg(feature = "integration")] or similar gating so tests execute under default features. If needed, adjust CI to run cargo test without --features and document that these tests are part of the default test suite.

Applied to files:

  • bin/orbis-node/src/tests/node.rs
🔇 Additional comments (12)
bin/orbis-node/src/dkg/coordinator/phases/mod.rs (1)

14-14: Looks good for phase-module dependency alignment.

This import addition is consistent with the ring-index capacity work landing across coordinator/info paths.

crates/proto/proto/info_service.proto (1)

27-28: Proto field addition is backward-compatible.

Adding managed_ring_count as a new tag (= 5) is the right shape for non-breaking evolution.

bin/orbis-node/src/constants.rs (1)

50-55: Good addition: explicit cap for local ring growth.

The constant and docs clearly encode the operational bound introduced by this PR.

crates/proto/README.md (1)

25-25: Docs update is aligned with the API change.

GetNodeInfo documentation now correctly reflects the new managed_ring_count field.

bin/orbis-node/src/dkg/error.rs (1)

55-57: Error modeling and status mapping are well done.

Using a dedicated variant plus ResourceExhausted gives callers clear, actionable failure semantics when node-local ring capacity is hit.

Also applies to: 109-111

bin/orbis-node/src/info/service.rs (1)

84-98: Managed ring count wiring into GetNodeInfoResponse looks correct.

The field is computed and surfaced cleanly in the node-info response path.

bin/orbis-node/src/info/tests.rs (1)

13-99: Nice coverage for the new node-info field.

Both zero-state and populated RingIndex scenarios are now asserted, which gives good confidence in managed_ring_count behavior.

bin/cli-tool/src/commands.rs (1)

904-905: CLI propagation of managed ring count is clean and complete.

Field mapping and user-facing output are both updated consistently.

Also applies to: 926-933, 942-943

bin/orbis-node/src/tests/node.rs (1)

187-187: LGTM — managed_ring_count == 0 assertions are correct for both phases.

No DKG has completed in either test scenario, so asserting 0 is the right expectation for both the bootstrap-only (Bootstrapping) and full-server (Ready) states.

Also applies to: 302-302

bin/orbis-node/src/dkg/coordinator/ring_storage.rs (2)

102-112: LGTM — preflight is advisory; add_ring_index_entry holds the definitive lock-protected check.

The TOCTOU window between preflight_new_ring_capacity releasing ring_index_lock and add_ring_index_entry re-acquiring it is safe: add_ring_index_entry independently calls ensure_local_ring_capacity under its own lock acquisition, so no ring can be over-admitted by a concurrent session slipping through between these two calls.


228-255: LGTM — clean fail-closed design.

read_ring_index correctly distinguishes Ok(None) (key absent → empty index) from Err (storage failure → propagate), and ensure_local_ring_capacity correctly short-circuits on an existing entry before applying the cap, making the preflight safe for upsert scenarios.

bin/orbis-node/src/dkg/coordinator/phases/phase4.rs (1)

156-166: LGTM — cleanup logic is correct for all role/kind combinations.

adds_new_local_ring is false for DealerReceiver (existing bundle updated, not created) and true for Fresh/Receiver (new bundle just written), so the conditional cleanup in cleanup_new_ring_bundle_after_index_failure is appropriately scoped. The original index-write error is correctly returned regardless of whether the cleanup deletion succeeds.

Also applies to: 195-205, 299-315

@JesseAbram JesseAbram merged commit c4373cf into develop May 8, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider limit on how many Keys a Node can hold

1 participant