refactor: use beacon chain deadlines for QBFT instance cleanup #719

diegomrsantos · 2025-10-25T16:18:27Z

Issue Addressed

Fixes instance cleanup issue where QBFT instances were cleaned up too early based on slot-based timeouts (QBFT_RETAIN_SLOTS = 1), preventing instances from reaching later rounds and completing consensus.

Proposed Changes

Core Changes

Refactored instance cleanup to use beacon chain inclusion deadlines instead of slot-based timeouts
Each role now has a deadline based on EIP-7045 and consensus spec requirements:
- Committee/Aggregator: End of epoch E+1 (attestation inclusion window)
- Proposer/SyncCommittee: Same slot (immediate inclusion)
- VoluntaryExit/ValidatorRegistration: One epoch window
Instances are cleaned when:
1. They complete successfully (via completion notification channel)
2. Their beacon chain deadline expires (checked each slot)
Added ManagedInstance struct tracking both channel and deadline
Implemented dual cleanup mechanism: completion-based (immediate) and deadline-based (deferred)

Test Coverage

Added 5 comprehensive tests verifying deadline-based cleanup:
- test_cleanup_removes_only_expired_instances - Verifies instances survive past old 2-slot timeout
- test_instance_completion_notification - Tests immediate cleanup after successful completion
- test_committee_can_reach_late_rounds - Verifies instances can reach round 10+ with max_round=12
- test_cleanup_across_epoch_boundary - Tests deadline calculation across epoch transitions
- test_multiple_instances_completing_rapidly - Verifies burst completion handling
Added 6 focused tests for role-specific deadline calculations (Committee, Aggregator, Proposer, SyncCommittee, VoluntaryExit, ValidatorRegistration)

Code Quality

Refactored all tests to follow Setup/Execute/Assert pattern with:
- Clear section comments (// SETUP, // EXECUTE, // ASSERT)
- Named constants replacing all magic numbers
- Descriptive assertion messages
Added mandatory test structure guidelines to CLAUDE.md and tester-subagent.md
Enhanced documentation in lib.rs with detailed deadline calculation explanations

Test Results

All 23 tests pass (up from 18 due to test refactoring that split one oversized test into 6 focused tests).

Additional Info

This aligns instance cleanup with actual beacon chain requirements rather than arbitrary slot-based timeouts, allowing instances to complete consensus within their protocol-defined windows.

Adds test to verify QBFT Committee instances can reach late rounds (9+) as configured with max_round=12. The test creates a Committee instance, forces round changes by keeping operators offline, then advances through multiple slots while verifying the instance survives to reach round 10. Currently fails - instance is cleaned up after 2 slots, reaching round 9 but unable to complete it (needs 120s, gets 8s).

claude-code-actions-sigp · 2025-10-25T16:18:42Z

Claude finished @diegomrsantos's task —— View job

Code Review Complete

I've analyzed this PR, which adds a test demonstrating a QBFT instance cleanup issue. The test is well-intentioned but has several areas for improvement.

Summary

This PR adds a single test (test_committee_can_reach_late_rounds) that successfully demonstrates a critical bug: QBFT Committee instances are cleaned up too aggressively, preventing them from reaching late rounds (9-12) despite being configured with max_round=12.

The Core Issue:

QBFT_RETAIN_SLOTS = 1 (line 43 in lib.rs) causes instances starting at slot 0 to be cleaned up at slot 2
This happens only 24 seconds into execution, during round 9's 120-second timeout
Committee duties need ~496 seconds (~41 slots) to complete all 12 rounds
Current cleanup removes them after just 2 slots (24 seconds)

Code Quality Assessment

Positive Aspects ✅

Clear Test Intent: The test effectively demonstrates the cleanup issue
Good Use of Test Infrastructure: Leverages existing TestContext and helper methods appropriately
Proper Async Setup: Uses #[tokio::test(start_paused = true)] for deterministic time control
Well-Commented: Extensive comments explain the test's purpose and timing calculations

Issues Identified ⚠️

1. Test Execution State (Critical)

The PR description states the test "currently fails," but the test is not marked as #[ignore] or #[should_panic]. This means:

CI will fail when this PR is merged
Future developers won't know if failure is expected
No clear path to knowing when the issue is fixed

Recommendation: Add #[ignore] with an issue reference, or restructure as a negative test that verifies current behavior.

2. Mathematical Errors in Comments

The comment at line 940 contains timing calculation errors:

States "Rounds 1-8: 16s, Rounds 9-10: 240s = 256s total"
Actually: Rounds 1-8 are 2s each (16s total), round 9 is 120s, round 10 is 120s
At slot 22 (264 seconds), we're actually 128 seconds into round 10, not "around round 10"

Impact: Confuses future maintainers about exact timing expectations.

3. Unrealistic Loop Bound

The test loops to slot 50 but always breaks at slot 22, creating confusion about intent. A more realistic bound (e.g., 1..=25) would better communicate expectations.

4. Missing Explicit Verification

The test doesn't verify that:

The instance is actually progressing through rounds 9-10
The instance survives past slot 2 (the critical cleanup point)
The specific failure mode matches the described behavior

Security Assessment 🔒

No security concerns identified. This is purely a test addition demonstrating a reliability issue.

Performance Assessment ⚡

Test Performance: Good use of start_paused = true makes this test run in simulated time, not real-time. The test should complete quickly despite simulating 264+ seconds.

Production Impact: The underlying issue this test demonstrates is production-critical:

Under poor network conditions requiring round changes, Committee instances will timeout prematurely
This affects consensus reliability for Committee duties
Needs urgent attention as max_round=12 is impossible to reach with current cleanup settings

Testing Assessment 🧪

Test Coverage

✅ Tests the specific scenario (late rounds with Committee duties)
✅ Uses realistic timing (12s slots, proper round timeouts)
❌ Doesn't test boundary conditions (exactly slot 2 when cleanup happens)
❌ Doesn't verify intermediate state (which round the instance is actually in)
❌ Not marked as failing/ignored, will break CI

Test Quality Recommendations

Add intermediate assertions to verify instance progress:

// After slot 2, verify instance wasn't cleaned up
// After slot 11, verify we're in round 10

Consider splitting into two tests:
- Negative test: "test_cleanup_removes_old_instances" (passes today)
- Positive test: "test_committee_can_reach_late_rounds" (ignored until fix)
Add metrics or logging inspection to verify actual round progression

Documentation Assessment 📚

Strengths

Test has comprehensive inline comments explaining purpose
PR description clearly articulates the issue
Timeline breakdown helps understand the problem

Needs Improvement

Should reference a GitHub issue for tracking the fix
Comment calculations need correction (see inline comment)
Should document expected failure mode more explicitly

Broader Context: The 86 Files Changed

This PR shows 86 files changed because it's based on the stable branch but includes 32 other commits from unstable. The actual change for THIS feature is only 1 file: anchor/qbft_manager/src/tests.rs (+49 lines).

The other 85 files are unrelated changes that happened to be merged into unstable between when this branch diverged and now. This is expected for a long-lived branch but can make reviewing confusing.

Recommendation: Consider rebasing onto the latest stable branch to reduce noise, or clarify in the PR description that only the test file is the new change.

Critical Production Issue Identified 🚨

While reviewing this test, I identified that the root cause is a production-critical bug in anchor/qbft_manager/src/lib.rs:43:

const QBFT_RETAIN_SLOTS: u64 = 1;

This value is too aggressive for Committee duties:

Committee/Aggregator roles need ~41 slots to complete max_round=12
Current setting cleans up after just 2 slots
Proposer/SyncCommittee need ~11 slots for max_round=6

Recommended Fix (in a separate PR):

Immediate: Increase to QBFT_RETAIN_SLOTS = 50 to accommodate worst-case
Better: Implement role-based retention policies
Best: Event-based cleanup (like go-ssv) - remove on new duty start, not time-based

Overall Assessment

Category	Rating	Notes
Code Quality	⚠️ Needs Work	Test logic sound, but execution state unclear
Security	✅ Good	No concerns
Performance	✅ Good	Efficient test design
Testing	⚠️ Needs Work	Will break CI, needs `#[ignore]` or restructuring
Documentation	⚠️ Needs Work	Math errors, unclear expectations

Recommendations

Before Merging (Required)

Fix the test execution state: Add #[ignore] or restructure as negative test
Correct the timing calculations in comments (see inline reviews)
Create a GitHub issue tracking the cleanup bug and reference it in comments
Update PR description to clarify only 1 file is actually being changed

Consider (Suggested)

Add intermediate assertions verifying instance survival and round progression
Reduce loop bound from 50 to realistic value (22-25)
Add a companion test verifying current cleanup behavior explicitly
Consider rebasing onto latest stable to reduce diff noise

Follow-up Work (Separate PR)

Fix the root cause: Adjust QBFT_RETAIN_SLOTS or implement role-based/event-based cleanup
Remove #[ignore] from this test once fix is implemented
Add tests for other duty types (Proposer, SyncCommittee, Aggregator)

Inline Comments

I've left 4 detailed inline comments on specific code sections. Please review each for technical details on:

Test logic issues (loop bounds)
Mathematical errors (timing calculations)
Test design improvements (boundary testing)
Documentation clarity (failure expectations)

Conclusion: This is valuable work that identifies a real production issue. With the recommended changes (particularly adding #[ignore] and fixing comment math), this test will serve as excellent documentation of the problem and a clear success criterion for the fix.

claude-code-actions-sigp · 2025-10-25T16:24:42Z

anchor/qbft_manager/src/tests.rs

+
+        // Advance through multiple slots while QBFT progresses
+        // This triggers cleanup logic which should NOT remove the active instance
+        for slot in 1..=50 {


Test Logic Issue: Unrealistic loop bound

The test loops to slot 50, but breaks at slot 22. This creates confusion about the test's intent. Consider these improvements:

Use a more reasonable upper bound (e.g., 1..=25) that better reflects when you expect the test to complete

Add assertion after the loop to verify we actually broke at slot 22 and didn't fall through

Consider making this data-driven: Calculate the expected slot based on round timeout constants rather than hard-coding slot 22

const EXPECTED_ROUND_10_SLOT: u64 = 22; // Document the calculation for slot in 1..=EXPECTED_ROUND_10_SLOT + 3 { // Small buffer clock.set_slot(slot); tokio::time::sleep(slot_duration).await; if slot == EXPECTED_ROUND_10_SLOT { context.set_operators_online(&[2, 3, 4]); break; } } // Verify we didn't fall through assert!(clock.now().unwrap().as_u64() == EXPECTED_ROUND_10_SLOT);

claude-code-actions-sigp · 2025-10-25T16:24:55Z

anchor/qbft_manager/src/tests.rs

+        for slot in 1..=50 {
+            clock.set_slot(slot);
+            tokio::time::sleep(slot_duration).await;
+


Mathematical Accuracy: Comment contains calculation error

The comment states "Rounds 1-8: 16s" but this is incorrect:

Rounds 1-8 each take 2 seconds (QUICK_TIMEOUT)

Total for rounds 1-8: 8 * 2s = 16s ✓

But the comment should clarify this is cumulative, not per-round

Correct breakdown to round 10:

Rounds 1-8: 8 × 2s = 16s

Round 9: 120s

Round 10: 120s

Total: 256s = 21.33 slots

At 12s per slot:

256s ÷ 12s/slot = 21.33 slots

So slot 22 is actually 8 seconds into round 10, not "around round 10"

Suggest updating comment to:

// At slot 22 (264 seconds = 22 * 12s): // - Rounds 1-8: 8 * 2s = 16s // - Round 9: 120s // - Total to complete round 9: 136s (11.33 slots) // - Round 10 starts at 136s (slot 11), so at slot 22 (264s) we're 128s into round 10 if slot == 22 {

This makes the test's timing expectations explicit and verifiable.

claude-code-actions-sigp · 2025-10-25T16:25:09Z

anchor/qbft_manager/src/tests.rs

+        // Keep 3 operators offline initially to prevent consensus and force round changes.
+        // With only 1 operator online out of 4, we cannot reach quorum (need 3).
+        // This will cause the instance to go through multiple round changes.
+        context.set_operators_offline(&[2, 3, 4]);


Test Design: Consider testing the actual cleanup boundary

The test keeps 3 out of 4 operators offline to force round changes, which is good. However, it doesn't verify the specific failure mode mentioned in the PR description. Consider adding:

Log inspection or metrics to verify the instance actually progresses through rounds 9-10

Intermediate assertions to check the instance is still alive at critical points (e.g., after slot 2 when cleanup occurs)

Test the boundary condition: What happens at exactly slot 2 when cleanup runs?

Example enhancement:

// Keep 3 operators offline to force round changes context.set_operators_offline(&[2, 3, 4]); // Advance to slot 2 where cleanup happens (cutoff = slot 2 - 1 = slot 1) // At this point, the instance starting at slot 0 should be removed clock.set_slot(1); tokio::time::sleep(slot_duration).await; clock.set_slot(2); tokio::time::sleep(slot_duration).await; // TODO: Add assertion here to verify instance is still alive // This is where the bug manifests - instance gets cleaned up too early

This would make the test more explicitly demonstrate the issue described in the PR.

claude-code-actions-sigp · 2025-10-25T16:25:24Z

anchor/qbft_manager/src/tests.rs

+    // The test simulates network conditions where consensus cannot be reached early by keeping
+    // all but one operator offline, forcing round changes. We advance the slot to trigger
+    // cleanup and verify the instance survives to reach round 10.
+    async fn test_committee_can_reach_late_rounds() {


Documentation: Test expectations need clarification

The comment says "Currently fails" but doesn't specify:

What failure mode to expect - Does it panic? Timeout? Return an error?

Whether this is expected behavior - Is this test marked with #[should_panic] or #[ignore]?

When it should pass - What changes need to be made for this test to pass?

Recommendations:

If the test is expected to fail, use #[ignore] or #[should_panic]:

#[tokio::test(start_paused = true)] #[ignore = "Fails due to premature cleanup - see issue #XXX"] async fn test_committee_can_reach_late_rounds() {

Or, restructure as a negative test that explicitly verifies the current behavior:

#[tokio::test(start_paused = true)] async fn test_committee_cleanup_prevents_late_rounds() { // Explicitly test that instances ARE cleaned up at slot 2 // This documents current behavior before fix is implemented

Add a GitHub issue reference so the test can be tracked to a fix

This prevents CI from failing and clearly communicates test intent to future developers.

Replace slot-based cleanup with duty-specific beacon chain inclusion deadlines. This allows QBFT instances to progress through all configured rounds without premature removal. Key changes: - Separate instance identity from manager metadata using ManagedInstance wrapper - Calculate duty-specific deadlines per EIP-7045 (attestations valid until end of epoch E+1) - Add slots_per_epoch configuration parameter - Implement dual-trigger cleaner (completion notification + deadline timeout) Fixes instances being cleaned after 2 slots, now properly respecting beacon chain inclusion windows (32-63 slots for attestations).

claude-code-actions-sigp · 2025-10-29T19:23:36Z

Claude finished @diegomrsantos's task —— View job

Code Review Complete

I've thoroughly reviewed this PR, which adds both a test demonstrating a QBFT cleanup issue AND implements a comprehensive fix. This is excellent work that addresses a production-critical bug.

Summary

This PR successfully implements a duty-aware cleanup system for QBFT instances, replacing the overly aggressive slot-based cleanup that was preventing Committee instances from reaching late rounds (9-12). The solution is architecturally sound and follows best practices.

Key Achievement: Instances configured with max_round=12 can now properly complete all 12 rounds, as the cleanup respects beacon chain inclusion deadlines (32-63 slots for attestations per EIP-7045) instead of removing instances after just 2 slots.

Changes Overview

Commit 1: Test (`82eaa98`)

Adds test_committee_can_reach_late_rounds() demonstrating the cleanup issue.

Commit 2: Fix (`4767f9d`)

Replaces QBFT_RETAIN_SLOTS constant with duty-specific deadline calculation
Adds ManagedInstance wrapper to track deadline per instance
Implements calculate_deadline() function per EIP-7045
Adds slots_per_epoch parameter to QbftManager
Implements dual-trigger cleaner (completion notification + deadline timeout)

Code Quality Assessment

Strengths ✅

Excellent Architecture: The separation of instance identity from manager metadata via ManagedInstance is clean and maintainable
EIP-7045 Compliance: Deadline calculation correctly implements attestation validity rules
Dual-Trigger Cleanup: Smart design - cleanup on completion OR deadline expiry
Comprehensive Testing: Test effectively demonstrates both the problem and validates the fix
Well-Documented: Comments clearly explain intent and timing calculations
Proper Error Handling: Uses Result types appropriately throughout

Issues Identified ⚠️

1. Test Comment Math Error (Minor - Documentation)

Location: anchor/qbft_manager/src/tests.rs:942-943

The comment states timing is "256 seconds" but calculates incorrectly. At slot 22 (264 seconds), we're actually 128 seconds into round 10, not "around round 10". See inline comment for detailed correction.

Impact: Low - doesn't affect test correctness, but could confuse future maintainers.

2. Unrealistic Loop Bound (Minor - Code Quality)

Location: anchor/qbft_manager/src/tests.rs:938

Loop bound is 50 but test always breaks at 22, creating confusion about intent. See inline comment for suggestion.

Impact: Low - test works correctly but readability suffers.

3. Missing Intermediate Assertions (Minor - Test Coverage)

Test doesn't explicitly verify instance survives cleanup at slot 2 (the critical failure point in old code) or that instance actually progresses through rounds 9-10.

Impact: Low - current test validates end-to-end behavior, but doesn't document the specific failure mode being fixed.

4. Missing Documentation (Minor - Maintainability)

ManagedInstance struct and calculate_deadline() function could benefit from enhanced documentation. See inline comments for suggestions.

Impact: Low - code is clear, but doc comments would help future maintainers.

Security Assessment 🔒

No security concerns identified. Changes are focused on improving reliability without introducing attack vectors.

Positive Security Impact:

Fixes a DoS-like condition where legitimate instances could be prematurely cleaned up
Instances now survive as long as beacon chain allows inclusion, improving liveness

Performance Assessment ⚡

Improvements ✅

Reduced Lock Contention: Using DashMap with per-instance metadata is efficient
Dual-Trigger Cleanup: Immediate cleanup on completion prevents memory leaks
Test Performance: Good use of start_paused = true for deterministic timing

No Performance Regressions ✅

Deadline calculation is O(1) per instance creation
Cleanup loop runs once per slot (same as before)
Memory footprint actually improved (instances cleaned sooner on completion)

Testing Assessment 🧪

Test Coverage ✅

Test validates the critical scenario (Committee reaching late rounds)
Test uses realistic timing (12s slots, proper round timeouts)
Test properly simulates network conditions (keeping operators offline)
verify_consensus() confirms successful completion

What's Missing ⚠️

No negative test verifying old behavior would have failed
No boundary testing for the cleanup at slot 2 specifically
No tests for other roles (Proposer, SyncCommittee, Aggregator deadlines)
No test for deadline expiry (what happens when deadline passes without completion)

Recommendations

Consider adding follow-up tests:

#[tokio::test]
async fn test_proposer_deadline_shorter_than_committee() { }

#[tokio::test]  
async fn test_instance_cleanup_on_deadline_expiry() { }

#[tokio::test]
async fn test_immediate_cleanup_on_completion() { }

Documentation Assessment 📚

Strengths ✅

Test has comprehensive inline comments
PR description clearly articulates the problem and solution
Commit messages follow conventional commits format
Code comments explain non-obvious logic

Areas for Improvement ⚠️

Math error in test comment (see inline comment on tests.rs:943)
calculate_deadline() could use more detail about EIP-7045 reasoning (see inline comment on lib.rs:45)
Missing doc comments on ManagedInstance struct (see inline comment on lib.rs:119)

Idiomatic Rust & Best Practices

Excellent Rust Usage ✅

Type Safety: ManagedInstance provides type-safe deadline tracking
Error Handling: Proper use of Result and ? operator throughout
Async Design: Clean use of tokio::select! for dual-trigger pattern
Zero Unsafe: No unsafe code needed
Memory Safety: Proper use of Arc, DashMap, and channels

Follows Repo Guidelines ✅

Modular design with clear boundaries ✅
Error types with context ✅
Async/await with Tokio ✅
Clear, descriptive naming ✅
Comments explain "why" not "what" ✅

Specific Technical Review

`calculate_deadline()` Function

Location: anchor/qbft_manager/src/lib.rs:43-60

Analysis: ✅ Excellent implementation

Correctly implements EIP-7045 for attestations
Proposer/SyncCommittee must be same-slot (correct)
Voluntary exits get one epoch (reasonable)
No overflow handling needed (epoch arithmetic safe in practice)

`ManagedInstance` Struct

Location: anchor/qbft_manager/src/lib.rs:119-122

Analysis: ✅ Clean separation of concerns

Separates identity (map key) from metadata (deadline + sender)
Allows different deadlines for different instance types
Type parameter preserves type safety

Cleaner Implementation

Location: anchor/qbft_manager/src/lib.rs:328-366

Analysis: ✅ Excellent dual-trigger design

Immediate cleanup on completion prevents memory leaks
Slot-based sweep removes expired instances
Handles both normal and timeout cases
Properly uses select! for concurrent event handling

No issues found - this is production-ready code.

Breaking Changes

None. This is a pure enhancement that:

Maintains backward compatibility (same public API)
Only changes internal cleanup logic
Tests demonstrate existing functionality still works

Overall Assessment

Category	Rating	Notes
Code Quality	✅ Excellent	Clean, maintainable, follows best practices
Security	✅ Good	No concerns, improves liveness
Performance	✅ Good	No regressions, potentially better
Testing	✅ Good	Comprehensive, could add more edge cases
Documentation	⚠️ Good	Minor math error, otherwise excellent
Architecture	✅ Excellent	Well-designed, scalable solution

Recommendations

Before Merging (Optional Improvements)

Fix test comment math at line 942-943 (see inline comment)
Reduce loop bound from 50 to 25 for clarity (see inline comment)
Add doc comment to ManagedInstance explaining its purpose (see inline comment)
Enhance calculate_deadline comment with EIP-7045 calculation detail (see inline comment)

Follow-up Work (Separate PRs)

Add tests for other duty types (Proposer, SyncCommittee)
Add test for deadline expiry behavior
Add test verifying immediate cleanup on completion
Consider adding metrics for instance cleanup (completion vs deadline)

Conclusion

This is excellent work that solves a real production issue with a well-architected solution. The code is clean, follows Rust best practices, and adheres to the project's guidelines. The few minor issues identified are documentation improvements rather than functional problems.

Recommendation: ✅ Approve with minor documentation improvements

The PR is ready to merge as-is. The suggested documentation improvements would make the code even better but are not blockers.

Review Summary: I've left 4 inline comments with specific suggestions for documentation improvements. All issues are minor and don't block merging. The implementation is sound and production-ready.

claude-code-actions-sigp · 2025-10-29T19:27:28Z

anchor/qbft_manager/src/tests.rs

+            tokio::time::sleep(slot_duration).await;
+
+            // At slot 22 (256 seconds = 16s + 240s), we should be around round 10
+            // Rounds 1-8: 16s, Rounds 9-10: 240s = 256s total


Math Error in Comment

The comment's calculation is incorrect. Here's the accurate breakdown:

Correct Timing:

Rounds 1-8: 8 × 2s = 16s (completes at ~slot 1.33)

Round 9: 120s (completes at 136s / ~slot 11.33)

Round 10: 120s (starts at 136s)

At slot 22 (264s): We're 128 seconds into round 10 (with 8s remaining)

Suggested fix:

// At slot 22 (264 seconds = 22 * 12s): // - Rounds 1-8: 8 * 2s = 16s (completes at slot 1.33) // - Round 9: 120s (completes at 136s / slot 11.33) // - Round 10: started at 136s, now 128s into it if slot == 22 {

This makes the timing expectations explicit and verifiable.

anchor/qbft_manager/src/tests.rs

claude-code-actions-sigp · 2025-10-29T19:27:30Z

anchor/qbft_manager/src/lib.rs

-// Map from an identifier to a sender for the instance
-type Map<I, D> = DashMap<I, UnboundedSender<QbftMessage<D>>>;
+// Manager's bookkeeping for an instance
+pub struct ManagedInstance<D: QbftData> {


Consider Adding Documentation

The ManagedInstance struct is a key part of the new architecture but lacks documentation. Consider adding:

/// Manager's bookkeeping for a QBFT instance. /// /// Tracks the communication channel and beacon chain inclusion deadline /// for each active instance. The `deadline` field determines when the /// instance should be cleaned up if not completed earlier. Instances are /// removed either when they complete (via completion notification) or when /// their deadline expires (via the periodic cleaner sweep). pub struct ManagedInstance<D: QbftData> { sender: UnboundedSender<QbftMessage<D>>, deadline: types::Slot, }

This helps future maintainers understand the purpose and lifecycle management.

claude-code-actions-sigp · 2025-10-29T19:27:31Z

anchor/qbft_manager/src/lib.rs

+/// Calculate the beacon chain inclusion deadline for a duty
+fn calculate_deadline(role: Role, slot: types::Slot, slots_per_epoch: u64) -> types::Slot {
+    match role {
+        Role::Committee | Role::Aggregator => {


Enhance Comment for Clarity

The calculation (epoch.as_u64() + 2) * slots_per_epoch - 1 implements EIP-7045 correctly, but the "why" could be clearer:

Suggested enhancement:

Role::Committee | Role::Aggregator => { // Attestations can be included until end of next epoch (epoch E+1) // Per EIP-7045: attestation from epoch E valid until end of epoch E+1 // // Calculation explanation: // - Epoch E+1 starts at: (E+1) * slots_per_epoch // - Epoch E+1 ends at: (E+2) * slots_per_epoch - 1 (last slot of E+1) let epoch = slot.epoch(slots_per_epoch); types::Slot::new((epoch.as_u64() + 2) * slots_per_epoch - 1) }

This makes it immediately clear why we use E+2 in the calculation.

Improve test readability by applying Setup/Execute/Assert structure: - Replace magic numbers with named constants (SINGLE_INSTANCE, TWO_INSTANCES, etc.) - Add mandatory section comments (// SETUP, // EXECUTE, // ASSERT) to all new tests - Split oversized test_role_based_deadline_calculations into 6 focused tests (one per role) - Add descriptive assertion messages explaining what must be true - Named all literals in new tests (OLD_CLEANUP_SLOT, BEACON_DEADLINE_SLOT, etc.) All 23 tests pass (up from 18 due to role deadline test split). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Add Setup/Execute/Assert pattern as mandatory requirement for all tests: CLAUDE.md: - Add "Writing Tests" section requiring tester-subagent usage before writing any test code - Ensures consistent test structure across codebase tester-subagent.md: - Add section 4 "Test Structure Requirements" with generic example - Define 4 mandatory requirements: section comments, named constants, assertion messages, one behavior per test - List anti-patterns to reject (no section comments, magic numbers, silent assertions, mixed code) - Update description to emphasize "MUST BE USED before writing ANY test code" This ensures all new tests follow readable, maintainable patterns. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Add detailed explanation for Committee/Aggregator deadline calculation: - Document the calculation formula: (E+2) * slots_per_epoch - 1 - Explain that this represents the last slot for on-chain inclusion - Reference EIP-7045 specification Enhance ManagedInstance documentation: - Convert to doc comment for better API documentation - Clarify that it tracks both channel and beacon chain deadline - Explain its role in the cleanup task 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

claude-code-actions-sigp bot reviewed Oct 25, 2025

View reviewed changes

diegomrsantos mentioned this pull request Oct 25, 2025

QBFT instances cleaned up too aggressively, preventing late rounds #720

Open

diegomrsantos changed the base branch from stable to unstable October 25, 2025 16:35

diegomrsantos force-pushed the test/qbft-late-rounds branch from 4cd49f3 to 1d995b4 Compare October 28, 2025 20:47

diegomrsantos marked this pull request as draft October 28, 2025 20:48

diegomrsantos force-pushed the test/qbft-late-rounds branch from 1d995b4 to 2cc3d0d Compare October 28, 2025 22:19

diegomrsantos force-pushed the test/qbft-late-rounds branch from 2cc3d0d to 4767f9d Compare October 28, 2025 23:26

diegomrsantos added the claude-recheck triggers claude review workflow to re-run label Oct 29, 2025

diegomrsantos self-assigned this Oct 29, 2025

diegomrsantos added QBFT and removed claude-recheck triggers claude review workflow to re-run labels Oct 29, 2025

diegomrsantos marked this pull request as ready for review October 29, 2025 19:23

claude-code-actions-sigp bot reviewed Oct 29, 2025

View reviewed changes

anchor/qbft_manager/src/tests.rs Outdated Show resolved Hide resolved

claude-code-actions-sigp bot reviewed Oct 29, 2025

View reviewed changes

dknopik added the v1.2.0 label Nov 7, 2025

Merge branch 'unstable' into test/qbft-late-rounds

b2fc159

diegomrsantos changed the title ~~test: add test for Committee instances reaching late rounds~~ refactor: use beacon chain deadlines for QBFT instance cleanup Nov 12, 2025

Merge branch 'unstable' into test/qbft-late-rounds

895c5c0

diegomrsantos requested a review from dknopik November 12, 2025 13:08

diegomrsantos and others added 2 commits November 12, 2025 14:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: use beacon chain deadlines for QBFT instance cleanup #719

refactor: use beacon chain deadlines for QBFT instance cleanup #719

Uh oh!

diegomrsantos commented Oct 25, 2025 •

edited

Loading

Uh oh!

claude-code-actions-sigp bot commented Oct 25, 2025 •

edited

Loading

Uh oh!

claude-code-actions-sigp bot Oct 25, 2025

Uh oh!

claude-code-actions-sigp bot Oct 25, 2025

Uh oh!

claude-code-actions-sigp bot Oct 25, 2025

Uh oh!

claude-code-actions-sigp bot Oct 25, 2025

Uh oh!

claude-code-actions-sigp bot commented Oct 29, 2025 •

edited

Loading

Uh oh!

claude-code-actions-sigp bot Oct 29, 2025

Uh oh!

Uh oh!

claude-code-actions-sigp bot Oct 29, 2025

Uh oh!

claude-code-actions-sigp bot Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

refactor: use beacon chain deadlines for QBFT instance cleanup #719

Are you sure you want to change the base?

refactor: use beacon chain deadlines for QBFT instance cleanup #719

Uh oh!

Conversation

diegomrsantos commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue Addressed

Proposed Changes

Core Changes

Test Coverage

Code Quality

Test Results

Additional Info

Uh oh!

claude-code-actions-sigp bot commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete

Summary

Code Quality Assessment

Positive Aspects ✅

Issues Identified ⚠️

1. Test Execution State (Critical)

2. Mathematical Errors in Comments

3. Unrealistic Loop Bound

4. Missing Explicit Verification

Security Assessment 🔒

Performance Assessment ⚡

Testing Assessment 🧪

Test Coverage

Test Quality Recommendations

Documentation Assessment 📚

Strengths

Needs Improvement

Broader Context: The 86 Files Changed

Critical Production Issue Identified 🚨

Overall Assessment

Recommendations

Before Merging (Required)

Consider (Suggested)

Follow-up Work (Separate PR)

Inline Comments

Uh oh!

claude-code-actions-sigp bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

claude-code-actions-sigp bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

claude-code-actions-sigp bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

claude-code-actions-sigp bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

claude-code-actions-sigp bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete

Summary

Changes Overview

Commit 1: Test (82eaa98)

Commit 2: Fix (4767f9d)

Code Quality Assessment

Strengths ✅

Issues Identified ⚠️

1. Test Comment Math Error (Minor - Documentation)

2. Unrealistic Loop Bound (Minor - Code Quality)

3. Missing Intermediate Assertions (Minor - Test Coverage)

4. Missing Documentation (Minor - Maintainability)

Security Assessment 🔒

Performance Assessment ⚡

Improvements ✅

No Performance Regressions ✅

Testing Assessment 🧪

Test Coverage ✅

What's Missing ⚠️

Recommendations

Documentation Assessment 📚

Strengths ✅

Areas for Improvement ⚠️

diegomrsantos commented Oct 25, 2025 •

edited

Loading

claude-code-actions-sigp bot commented Oct 25, 2025 •

edited

Loading

claude-code-actions-sigp bot commented Oct 29, 2025 •

edited

Loading

Commit 1: Test (`82eaa98`)

Commit 2: Fix (`4767f9d`)

`calculate_deadline()` Function

`ManagedInstance` Struct