Add test for null snapshot recovery in create_sync_snapshot_req#97
Merged
antonio2368 merged 2 commits intomasterfrom Mar 19, 2026
Merged
Add test for null snapshot recovery in create_sync_snapshot_req#97antonio2368 merged 2 commits intomasterfrom
antonio2368 merged 2 commits intomasterfrom
Conversation
The previous commit replaced the fatal system_exit/abort in
create_sync_snapshot_req with graceful recovery (clear sync state,
retry on next heartbeat). That fix has two branches:
1. snp == nullptr: snapshot is unavailable, log warning and retry
2. last_log_idx > snp->get_last_log_idx(): peer advanced past
snapshot, clear sync flag and retry
The existing snapshot_stale_sync_flag_test only covers the caller-
side guard in create_append_entries_req, which prevents entering
create_sync_snapshot_req when the peer has caught up. It does not
exercise the recovery paths inside create_sync_snapshot_req itself
(the old code had LCOV_EXCL markers confirming zero coverage).
This commit adds snapshot_null_snapshot_join_test, which exercises
branch (1) via the handle_join_leave -> sync_log_to_new_srv path:
- Set up a 2-node cluster with compacted logs
- Null out the leader's last_snapshot_ via a new test helper
- Add a new server whose start_idx < log_store_->start_index(),
forcing create_sync_snapshot_req with a null snapshot
- Verify the leader does NOT abort and recovers after the
snapshot is restored by a subsequent snapshot creation cycle
To support this, a clear_last_snapshot helper is added to
raft_server_handler and exposed through FakeNetwork.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a unit test to cover the “null snapshot” recovery branch inside create_sync_snapshot_req, ensuring the leader does not abort when get_last_snapshot() returns nullptr during a join-driven snapshot sync.
Changes:
- Add
snapshot_null_snapshot_join_testto exercise join →sync_log_to_new_srv→create_sync_snapshot_reqwhen the leader snapshot pointer is null. - Expose a test-only helper to clear
raft_server::last_snapshot_viaraft_server_handlerandFakeNetwork. - Register the new test in
snapshot_test’smain.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/unit/snapshot_test.cxx | Adds a new join-based unit test that simulates last_snapshot_ == nullptr and verifies the leader recovers and the new node catches up. |
| tests/unit/fake_network.hxx | Exposes clearLastSnapshot helper to tests via FakeNetwork. |
| include/libnuraft/raft_server_handler.hxx | Adds a protected test helper to clear the server’s last snapshot pointer. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Comment on lines
+1061
to
+1070
| // Retry the join: trigger heartbeats and drive the protocol | ||
| // until S3 finishes receiving the snapshot and catches up. | ||
| for (int ii = 0; ii < 10; ++ii) { | ||
| s1.fTimer->invoke( timer_task_type::heartbeat_timer ); | ||
| s1.fNet->execReqResp(); | ||
| s1.fNet->execReqResp(); | ||
| CHK_Z( wait_for_sm_exec(pkgs, COMMIT_TIMEOUT_SEC) ); | ||
| if (!s3.raftServer->is_receiving_snapshot()) break; | ||
| } | ||
|
|
|
|
||
| // Verify snapshot was created and logs compacted. | ||
| CHK_OK( s2.getTestSm()->isSame( *s1.getTestSm() ) ); | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The previous commit replaced the fatal system_exit/abort in create_sync_snapshot_req with graceful recovery (clear sync state, retry on next heartbeat). That fix has two branches:
The existing snapshot_stale_sync_flag_test only covers the caller- side guard in create_append_entries_req, which prevents entering create_sync_snapshot_req when the peer has caught up. It does not exercise the recovery paths inside create_sync_snapshot_req itself (the old code had LCOV_EXCL markers confirming zero coverage).
This commit adds snapshot_null_snapshot_join_test, which exercises branch (1) via the handle_join_leave -> sync_log_to_new_srv path:
To support this, a clear_last_snapshot helper is added to raft_server_handler and exposed through FakeNetwork.