IGNITE-27678 Same partitions on different nodes can hold different updates if writeThrough is enabled#12925
IGNITE-27678 Same partitions on different nodes can hold different updates if writeThrough is enabled#12925zstan wants to merge 26 commits into
Conversation
c623095 to
2029771
Compare
0bfd72d to
ff5d209
Compare
…dates if writeThrough is enabled
ff5d209 to
d2bb726
Compare
|
There was a problem hiding this comment.
Pull request overview
This PR addresses transaction recovery for write-through caches so surviving nodes do not retain divergent partition updates after a node leaves during transaction completion.
Changes:
- Adds a DHT transaction salvage message and handling paths for write-through recovery.
- Adjusts transaction manager/future recovery behavior and store-session exception propagation.
- Adds and updates tests covering write-through recovery, idle verify consistency, and leftover salvage transaction cleanup.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
modules/core/src/test/java/org/apache/ignite/internal/processors/cache/query/IndexingSpiQueryTxSelfTest.java |
Updates expected transaction exception behavior and waits for futures. |
modules/core/src/test/java/org/apache/ignite/internal/processors/cache/distributed/IgniteTxCacheWriteSynchronizationModesMultithreadedTest.java |
Adds post-test assertion for empty salvage transaction tracking. |
modules/core/src/test/java/org/apache/ignite/internal/processors/cache/distributed/dht/IgniteCacheTxRecoveryRollbackTest.java |
Extends cleanup checks to include salvage transaction state. |
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/transactions/IgniteTxRemoteStateImpl.java |
Makes remote tx state maps final. |
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/transactions/IgniteTxRemoteStateAdapter.java |
Makes active cache ID list final. |
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/transactions/IgniteTxManager.java |
Adds write-through-specific salvage/recovery coordination. |
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/transactions/IgniteTxHandler.java |
Registers handling for DHT salvage requests. |
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/transactions/IgniteTxAdapter.java |
Adjusts nullable annotation on finalization status implementation. |
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/store/GridCacheStoreManagerAdapter.java |
Logs and propagates runtime store-session listener failures. |
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/near/GridNearTxFinishFuture.java |
Sends salvage requests to backups during committed checks for write-through txs. |
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/topology/GridDhtLocalPartition.java |
Improves partition-state assertion diagnostics. |
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java |
Suppresses unused warning for retained field. |
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/GridDhtTxSalvageMessage.java |
Adds a new cache message for transaction salvage. |
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/GridDhtTxRemote.java |
Simplifies master node ID collection creation. |
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/GridDhtTxLocal.java |
Salvages local write-through txs on heuristic failure. |
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/GridDhtTxFinishFuture.java |
Adds node-id-aware salvage handling when a finish target has left. |
modules/control-utility/src/test/java/org/apache/ignite/util/IdleVerifyCheckWithWriteThroughTest.java |
Adds idle-verify regression coverage for write-through tx recovery. |
modules/control-utility/src/test/java/org/apache/ignite/testsuites/IgniteControlUtilityTestSuite2.java |
Includes the new write-through idle-verify test in the suite. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| // Treat heuristic exception as critical. | ||
| if (X.hasCause(e, IgniteTxHeuristicCheckedException.class)) | ||
| if (X.hasCause(e, IgniteTxHeuristicCheckedException.class)) { |
There was a problem hiding this comment.
CacheStoreWithIgniteTxFailureTest#testSystemExceptionAfterCacheStoreCommit fail without this fix
IndexingSpiQueryTxSelfTest - waitForCondition appended
TCBot Test Analysis
Possible Blockers (0)No blockers found. New Tests (5)
|
| if ((tx.near() && !tx.local() && tx.originatingNodeId().equals(evtNodeId)) | ||
| Map<UUID, Collection<UUID>> txNodes = tx.transactionNodes(); | ||
|
|
||
| if (tx.storeWriteThrough() && txNodes != null |
There was a problem hiding this comment.
for reviewer: class that can catch\handle more questions about flags\implementation approaches is : IgniteTxCacheWriteSynchronizationModesMultithreadedTest
it can be speed up by testing only writeThrough scenarios, check:
IgniteTxCacheWriteSynchronizationModesMultithreadedTest#multithreaded - store If {@code true} sets store in cache configuration
| sendTxSalvage(tx, evtNodeId); | ||
| } | ||
|
|
||
| Supplier<Boolean> fullSyncedOp = () -> tx.writeEntries().stream().map(e -> |
There was a problem hiding this comment.
Possible we can change it in future but for now only full synced is covered, othervize i expect some failures with IgniteTxCacheWriteSynchronizationModesMultithreadedTest
|
| if (tx.masterNodeIds().contains(nodeId)) | ||
| continue; | ||
|
|
||
| ClusterNode involvedNode = cctx.discovery().node(nodeId); |
There was a problem hiding this comment.
Let's rename involvedNode to backupNode for clarity and similarity with other GridDhtTxSalvageMessage sending code.
|
|
||
| U.awaitQuiet(nodeLeftRegisteredOnBackup); | ||
|
|
||
| // let`s wait until all discovery events have been processed on backup node. |
| assertEquals(secondVal, nodeCoord.cache(DEFAULT_CACHE_NAME).get(primaryKey)); | ||
| assertEquals(secondVal, nodeBackup.cache(DEFAULT_CACHE_NAME).get(primaryKey)); | ||
|
|
||
| if (withPersistence) { |
There was a problem hiding this comment.
Let's remove withPersistance condition and check for in-memory mode too. For in-memory mode it cache-store value will be checked.
| } | ||
|
|
||
| Supplier<Boolean> fullSyncedOp = () -> tx.writeEntries().stream().map(e -> | ||
| cctx.cacheContext(e.cacheId())).allMatch(GridCacheContext::syncCommit); |
There was a problem hiding this comment.
According to IgniteTxStateImpl#syncMode mode is FULL_SYNC if any of caches have FULL_SYNC mode. Instead of entries iterator we can iterate over active cache ids (txState().cacheIds()). Maybe it's better ti reuse IgniteTxStateImpl#syncMode somehow, but there is assert false in IgniteTxRemoteStateAdapter
There was a problem hiding this comment.
unexpected behavior in this method )
for (int i = 0; i < activeCacheIds.size(); i++) {
int cacheId = activeCacheIds.get(i);
CacheWriteSynchronizationMode cacheSyncMode =
cctx.cacheContext(cacheId).config().getWriteSynchronizationMode();
switch (cacheSyncMode) {
case FULL_SYNC:
return FULL_SYNC;
There was a problem hiding this comment.
What is unexpected? If any of caches have FULL_SYNC - return FULL_SYNC, else if any of caches have PRIMARY_SYNC return PRIMARY_SYNC, else (all caches have FULL_ASYNC) - return FULL_ASYNC
| */ | ||
| private void multithreadedTests(CacheWriteSynchronizationMode syncMode, boolean restart) throws Exception { | ||
| multithreaded(syncMode, 0, false, false, restart); | ||
| //multithreaded(syncMode, 0, false, false, restart); |
There was a problem hiding this comment.
Please revert these changes
Possible compatibility issues. Please, check rolling upgrade casesThis PR modifies protected classes (with Order annotation). Affected files:
|




No description provided.