Skip to content
49 changes: 49 additions & 0 deletions anchor/qbft_manager/src/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -902,6 +902,55 @@ mod manager_tests {

context.verify_consensus().await;
}

#[tokio::test(start_paused = true)]
// Test that Committee instances can reach late rounds (9+) with max_round=12 configuration.
// This verifies that instances survive long enough to progress through many round changes
// as configured. Committee role has max_round=12, so instances should be able to reach
// round 10 before timing out at round 13.
//
// The test simulates network conditions where consensus cannot be reached early by keeping
// all but one operator offline, forcing round changes. We advance the slot to trigger
// cleanup and verify the instance survives to reach round 10.
async fn test_committee_can_reach_late_rounds() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation: Test expectations need clarification

The comment says "Currently fails" but doesn't specify:

  1. What failure mode to expect - Does it panic? Timeout? Return an error?
  2. Whether this is expected behavior - Is this test marked with #[should_panic] or #[ignore]?
  3. When it should pass - What changes need to be made for this test to pass?

Recommendations:

  1. If the test is expected to fail, use #[ignore] or #[should_panic]:
#[tokio::test(start_paused = true)]
#[ignore = "Fails due to premature cleanup - see issue #XXX"]
async fn test_committee_can_reach_late_rounds() {
  1. Or, restructure as a negative test that explicitly verifies the current behavior:
#[tokio::test(start_paused = true)]
async fn test_committee_cleanup_prevents_late_rounds() {
    // Explicitly test that instances ARE cleaned up at slot 2
    // This documents current behavior before fix is implemented
  1. Add a GitHub issue reference so the test can be tracked to a fix

This prevents CI from failing and clearly communicates test intent to future developers.

let setup = setup_test(1);
let clock = setup.clock.clone();
let mut context = TestContext::<BeaconVote>::new(
setup.clock,
setup.executor,
CommitteeSize::Four,
setup.all_data,
)
.await;

// Keep 3 operators offline initially to prevent consensus and force round changes.
// With only 1 operator online out of 4, we cannot reach quorum (need 3).
// This will cause the instance to go through multiple round changes.
context.set_operators_offline(&[2, 3, 4]);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test Design: Consider testing the actual cleanup boundary

The test keeps 3 out of 4 operators offline to force round changes, which is good. However, it doesn't verify the specific failure mode mentioned in the PR description. Consider adding:

  1. Log inspection or metrics to verify the instance actually progresses through rounds 9-10
  2. Intermediate assertions to check the instance is still alive at critical points (e.g., after slot 2 when cleanup occurs)
  3. Test the boundary condition: What happens at exactly slot 2 when cleanup runs?

Example enhancement:

// Keep 3 operators offline to force round changes
context.set_operators_offline(&[2, 3, 4]);

// Advance to slot 2 where cleanup happens (cutoff = slot 2 - 1 = slot 1)
// At this point, the instance starting at slot 0 should be removed
clock.set_slot(1);
tokio::time::sleep(slot_duration).await;

clock.set_slot(2);
tokio::time::sleep(slot_duration).await;

// TODO: Add assertion here to verify instance is still alive
// This is where the bug manifests - instance gets cleaned up too early

This would make the test more explicitly demonstrate the issue described in the PR.


// Advance time and slots to simulate reaching round 10
// Instance starts at slot 0
let slot_duration = Duration::from_secs(12);

// Advance through multiple slots while QBFT progresses
// This triggers cleanup logic which should NOT remove the active instance
for slot in 1..=50 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test Logic Issue: Unrealistic loop bound

The test loops to slot 50, but breaks at slot 22. This creates confusion about the test's intent. Consider these improvements:

  1. Use a more reasonable upper bound (e.g., 1..=25) that better reflects when you expect the test to complete
  2. Add assertion after the loop to verify we actually broke at slot 22 and didn't fall through
  3. Consider making this data-driven: Calculate the expected slot based on round timeout constants rather than hard-coding slot 22
const EXPECTED_ROUND_10_SLOT: u64 = 22; // Document the calculation
for slot in 1..=EXPECTED_ROUND_10_SLOT + 3 {  // Small buffer
    clock.set_slot(slot);
    tokio::time::sleep(slot_duration).await;
    
    if slot == EXPECTED_ROUND_10_SLOT {
        context.set_operators_online(&[2, 3, 4]);
        break;
    }
}
// Verify we didn't fall through
assert!(clock.now().unwrap().as_u64() == EXPECTED_ROUND_10_SLOT);

clock.set_slot(slot);
tokio::time::sleep(slot_duration).await;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mathematical Accuracy: Comment contains calculation error

The comment states "Rounds 1-8: 16s" but this is incorrect:

  • Rounds 1-8 each take 2 seconds (QUICK_TIMEOUT)
  • Total for rounds 1-8: 8 * 2s = 16s
  • But the comment should clarify this is cumulative, not per-round

Correct breakdown to round 10:

  • Rounds 1-8: 8 × 2s = 16s
  • Round 9: 120s
  • Round 10: 120s
  • Total: 256s = 21.33 slots

At 12s per slot:

  • 256s ÷ 12s/slot = 21.33 slots
  • So slot 22 is actually 8 seconds into round 10, not "around round 10"

Suggest updating comment to:

// At slot 22 (264 seconds = 22 * 12s):
// - Rounds 1-8: 8 * 2s = 16s
// - Round 9: 120s  
// - Total to complete round 9: 136s (11.33 slots)
// - Round 10 starts at 136s (slot 11), so at slot 22 (264s) we're 128s into round 10
if slot == 22 {

This makes the test's timing expectations explicit and verifiable.

// At slot 22 (256 seconds = 16s + 240s), we should be around round 10
// Rounds 1-8: 16s, Rounds 9-10: 240s = 256s total

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Math Error in Comment

The comment's calculation is incorrect. Here's the accurate breakdown:

Correct Timing:

  • Rounds 1-8: 8 × 2s = 16s (completes at ~slot 1.33)
  • Round 9: 120s (completes at 136s / ~slot 11.33)
  • Round 10: 120s (starts at 136s)
  • At slot 22 (264s): We're 128 seconds into round 10 (with 8s remaining)

Suggested fix:

// At slot 22 (264 seconds = 22 * 12s):
// - Rounds 1-8: 8 * 2s = 16s (completes at slot 1.33)
// - Round 9: 120s (completes at 136s / slot 11.33)
// - Round 10: started at 136s, now 128s into it
if slot == 22 {

This makes the timing expectations explicit and verifiable.

if slot == 22 {
// Bring operators back online during round 10 to allow consensus
context.set_operators_online(&[2, 3, 4]);
break;
}
}

// Verify that consensus is reached successfully, proving the instance
// survived cleanup and was able to reach round 10
context.verify_consensus().await;
}
}

// very important: set paused to true for deterministic timer
Expand Down
Loading