Skip to content

Conversation

@kabo87777
Copy link

Fix Non-Deterministic Behavior in AlignedByDeviceTest

Problem

Nine tests in AlignedByDeviceTest were failing non-deterministically under NonDex (50-100% failure rates) due to order-dependent fragment assertions:

  • testAggregation2Device2Region
  • testAggregation2Device2RegionOrderByTime
  • testAggregation2Device2RegionWithValueFilter
  • testAggregation2Device2RegionWithValueFilterOrderByTime
  • testAggregation2Device3RegionWithValueFilter
  • testAggregation2Device3RegionWithValueFilterOrderByTime
  • testDiffFunction2Device2Region
  • testDiffFunction2Device3Region
  • testDiffFunctionWithOrderByTime2Device3Region

Way to Reproduce

cd iotdb-core/datanode
mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex \
  -Dtest=AlignedByDeviceTest#testAggregation2Device2Region \
  -DnondexRuns=10 -Drat.skip=true
# Result: 10/10 failures (100% failure rate before fix)

Root Cause

Tests used fixed-index access to fragments (plan.getInstances().get(0)) assuming deterministic order:

PlanNode f1Root = plan.getInstances().get(0).getFragment().getPlanNodeTree();
PlanNode f2Root = plan.getInstances().get(1).getFragment().getPlanNodeTree();
assertTrue(f1Root.getChildren().get(0) instanceof AggregationMergeSortNode); // Fails when order changes

Fragment order is non-deterministic due to HashMap iteration during distributed query planning. When NonDex shuffled collection order, fragments appeared at different positions, causing assertions to fail even though the query plan was semantically correct.

Solution

Made assertions order-independent by counting node types across all fragments instead of assuming fixed positions.

Added Helper Methods

private int countNodesOfType(PlanNode root, Class<?> nodeType) {
  // Recursively counts nodes of a specific type in plan tree
}

private <T extends PlanNode> T findFirstNodeOfType(PlanNode root, Class<T> nodeType) {
  // Recursively finds first node of a specific type in plan tree
}

Transformed Assertions

Before (Order-Dependent):

PlanNode f1Root = plan.getInstances().get(0).getFragment().getPlanNodeTree();
assertTrue(f1Root.getChildren().get(0) instanceof AggregationMergeSortNode);
assertTrue(f1Root.getChildren().get(0).getChildren().get(0) instanceof DeviceViewNode);

After (Order-Independent):

int aggMergeSortCount = 0;
int deviceViewCount = 0;
for (FragmentInstance instance : plan.getInstances()) {
  PlanNode root = instance.getFragment().getPlanNodeTree();
  aggMergeSortCount += countNodesOfType(root, AggregationMergeSortNode.class);
  deviceViewCount += countNodesOfType(root, DeviceViewNode.class);
}
assertTrue("Expected at least one AggregationMergeSortNode", aggMergeSortCount >= 1);
assertTrue("Expected at least two DeviceViewNodes", deviceViewCount >= 2);

Key Improvements

  • Replaced fixed-index access with iteration through all fragments
  • Changed strict equality to flexible assertions (>= N instead of exact position checks)
  • Added descriptive assertion messages for debugging
  • Recursive tree traversal searches entire plan trees regardless of node depth or position

Key Changed Classes

  • AlignedByDeviceTest: Added 2 helper methods, modified 9 test methods (~400 lines), test-only changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant