Skip to content

IGNITE-27678 Same partitions on different nodes can hold different updates if writeThrough is enabled#12925

Open
zstan wants to merge 26 commits into
apache:masterfrom
zstan:ignite-27678-nostruct
Open

IGNITE-27678 Same partitions on different nodes can hold different updates if writeThrough is enabled#12925
zstan wants to merge 26 commits into
apache:masterfrom
zstan:ignite-27678-nostruct

Conversation

@zstan

@zstan zstan commented Mar 23, 2026

Copy link
Copy Markdown
Contributor

No description provided.

@zstan zstan force-pushed the ignite-27678-nostruct branch 2 times, most recently from c623095 to 2029771 Compare March 23, 2026 11:40
@zstan zstan force-pushed the ignite-27678-nostruct branch 3 times, most recently from 0bfd72d to ff5d209 Compare April 6, 2026 13:10
@zstan zstan force-pushed the ignite-27678-nostruct branch from ff5d209 to d2bb726 Compare April 9, 2026 12:18
@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
6 New Code Smells (required ≤ 1)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses transaction recovery for write-through caches so surviving nodes do not retain divergent partition updates after a node leaves during transaction completion.

Changes:

  • Adds a DHT transaction salvage message and handling paths for write-through recovery.
  • Adjusts transaction manager/future recovery behavior and store-session exception propagation.
  • Adds and updates tests covering write-through recovery, idle verify consistency, and leftover salvage transaction cleanup.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
modules/core/src/test/java/org/apache/ignite/internal/processors/cache/query/IndexingSpiQueryTxSelfTest.java Updates expected transaction exception behavior and waits for futures.
modules/core/src/test/java/org/apache/ignite/internal/processors/cache/distributed/IgniteTxCacheWriteSynchronizationModesMultithreadedTest.java Adds post-test assertion for empty salvage transaction tracking.
modules/core/src/test/java/org/apache/ignite/internal/processors/cache/distributed/dht/IgniteCacheTxRecoveryRollbackTest.java Extends cleanup checks to include salvage transaction state.
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/transactions/IgniteTxRemoteStateImpl.java Makes remote tx state maps final.
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/transactions/IgniteTxRemoteStateAdapter.java Makes active cache ID list final.
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/transactions/IgniteTxManager.java Adds write-through-specific salvage/recovery coordination.
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/transactions/IgniteTxHandler.java Registers handling for DHT salvage requests.
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/transactions/IgniteTxAdapter.java Adjusts nullable annotation on finalization status implementation.
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/store/GridCacheStoreManagerAdapter.java Logs and propagates runtime store-session listener failures.
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/near/GridNearTxFinishFuture.java Sends salvage requests to backups during committed checks for write-through txs.
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/topology/GridDhtLocalPartition.java Improves partition-state assertion diagnostics.
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java Suppresses unused warning for retained field.
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/GridDhtTxSalvageMessage.java Adds a new cache message for transaction salvage.
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/GridDhtTxRemote.java Simplifies master node ID collection creation.
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/GridDhtTxLocal.java Salvages local write-through txs on heuristic failure.
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/GridDhtTxFinishFuture.java Adds node-id-aware salvage handling when a finish target has left.
modules/control-utility/src/test/java/org/apache/ignite/util/IdleVerifyCheckWithWriteThroughTest.java Adds idle-verify regression coverage for write-through tx recovery.
modules/control-utility/src/test/java/org/apache/ignite/testsuites/IgniteControlUtilityTestSuite2.java Includes the new write-through idle-verify test in the suite.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


// Treat heuristic exception as critical.
if (X.hasCause(e, IgniteTxHeuristicCheckedException.class))
if (X.hasCause(e, IgniteTxHeuristicCheckedException.class)) {

@zstan zstan May 18, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CacheStoreWithIgniteTxFailureTest#testSystemExceptionAfterCacheStoreCommit fail without this fix
IndexingSpiQueryTxSelfTest - waitForCondition appended

@ignitetcbot

Copy link
Copy Markdown
Contributor

TCBot Test Analysis

Possible Blockers (0)

No blockers found.

New Tests (5)

  • Cache 13: 1 tests
    • IgniteCacheTestSuite13: SystemViewCacheExpiryPolicyTest.testCacheViewExpiryPolicy[factory=javax.cache.configuration.FactoryBuilder$SingletonFactory@7e7740a5, actual=SingletonFactory [expiryPlc=EternalExpiryPolicy [create=ETERNAL]]] - PASSED
  • Control Utility 2: 4 tests
    • IgniteControlUtilityTestSuite2: IdleVerifyCheckWithWriteThroughTest.testTxCoordinatorLeftClusterWithEnabledReadWriteThrough[cmdHnd=cli, withPersistence=false, concMode=PESSIMISTIC] - PASSED
    • IgniteControlUtilityTestSuite2: IdleVerifyCheckWithWriteThroughTest.testTxCoordinatorLeftClusterWithEnabledReadWriteThrough[cmdHnd=cli, withPersistence=true, concMode=PESSIMISTIC] - PASSED
    • IgniteControlUtilityTestSuite2: IdleVerifyCheckWithWriteThroughTest.testTxCoordinatorLeftClusterWithEnabledReadWriteThrough[cmdHnd=cli, withPersistence=false, concMode=OPTIMISTIC] - PASSED
    • IgniteControlUtilityTestSuite2: IdleVerifyCheckWithWriteThroughTest.testTxCoordinatorLeftClusterWithEnabledReadWriteThrough[cmdHnd=cli, withPersistence=true, concMode=OPTIMISTIC] - PASSED

if ((tx.near() && !tx.local() && tx.originatingNodeId().equals(evtNodeId))
Map<UUID, Collection<UUID>> txNodes = tx.transactionNodes();

if (tx.storeWriteThrough() && txNodes != null

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for reviewer: class that can catch\handle more questions about flags\implementation approaches is : IgniteTxCacheWriteSynchronizationModesMultithreadedTest
it can be speed up by testing only writeThrough scenarios, check:
IgniteTxCacheWriteSynchronizationModesMultithreadedTest#multithreaded - store If {@code true} sets store in cache configuration

sendTxSalvage(tx, evtNodeId);
}

Supplier<Boolean> fullSyncedOp = () -> tx.writeEntries().stream().map(e ->

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible we can change it in future but for now only full synced is covered, othervize i expect some failures with IgniteTxCacheWriteSynchronizationModesMultithreadedTest

@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
5 New Code Smells (required ≤ 1)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

if (tx.masterNodeIds().contains(nodeId))
continue;

ClusterNode involvedNode = cctx.discovery().node(nodeId);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename involvedNode to backupNode for clarity and similarity with other GridDhtTxSalvageMessage sending code.


U.awaitQuiet(nodeLeftRegisteredOnBackup);

// let`s wait until all discovery events have been processed on backup node.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upper case

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

assertEquals(secondVal, nodeCoord.cache(DEFAULT_CACHE_NAME).get(primaryKey));
assertEquals(secondVal, nodeBackup.cache(DEFAULT_CACHE_NAME).get(primaryKey));

if (withPersistence) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove withPersistance condition and check for in-memory mode too. For in-memory mode it cache-store value will be checked.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, done

}

Supplier<Boolean> fullSyncedOp = () -> tx.writeEntries().stream().map(e ->
cctx.cacheContext(e.cacheId())).allMatch(GridCacheContext::syncCommit);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to IgniteTxStateImpl#syncMode mode is FULL_SYNC if any of caches have FULL_SYNC mode. Instead of entries iterator we can iterate over active cache ids (txState().cacheIds()). Maybe it's better ti reuse IgniteTxStateImpl#syncMode somehow, but there is assert false in IgniteTxRemoteStateAdapter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unexpected behavior in this method )

        for (int i = 0; i < activeCacheIds.size(); i++) {
            int cacheId = activeCacheIds.get(i);

            CacheWriteSynchronizationMode cacheSyncMode =
                cctx.cacheContext(cacheId).config().getWriteSynchronizationMode();

            switch (cacheSyncMode) {
                case FULL_SYNC:
                    return FULL_SYNC;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is unexpected? If any of caches have FULL_SYNC - return FULL_SYNC, else if any of caches have PRIMARY_SYNC return PRIMARY_SYNC, else (all caches have FULL_ASYNC) - return FULL_ASYNC

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, fixed

*/
private void multithreadedTests(CacheWriteSynchronizationMode syncMode, boolean restart) throws Exception {
multithreaded(syncMode, 0, false, false, restart);
//multithreaded(syncMode, 0, false, false, restart);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert these changes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ups .. done

@github-actions

Copy link
Copy Markdown

Possible compatibility issues. Please, check rolling upgrade cases

This PR modifies protected classes (with Order annotation).
Changes to these classes can break rolling upgrade compatibility.

Affected files:

  • modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/GridDhtTxSalvageMessage.java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants