Fix the unstable unit test case: ResourceBoundedExecutorSuite by sperlingxx · Pull Request #13910 · NVIDIA/spark-rapids

sperlingxx · 2025-12-01T01:40:19Z

Mirror PR of #13906, which enforces the fix on branch-25.12 as well.

This PR stabilizes a flaky unit test whose output was nondeterministic due to runtime fluctuations by introducing a startup lock to enforce a deterministic execution order.

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

greptile-apps · 2025-12-01T01:42:25Z

Greptile Overview

Greptile Summary

This PR fixes a flaky unit test (ResourceBoundedExecutorSuite) by introducing a ReentrantLock to control task execution timing and ensure deterministic ordering.

Key Changes:

Modified buildDummyFn to accept a Lock parameter that blocks task execution until released
Added lock acquisition before task submission and release after all tasks are queued
Removed reliance on Thread.sleep() timing assumptions for execution order
Updated test structure to submit a dummy unbounded task first, then lock-protected tasks
Adjusted comprehensive test from 6 tasks to 5 tasks (removed duplicate unbounded task)

Issues Found:

Minor: Comment on line 110 describing execution order appears incorrect and doesn't match the actual expected sequence

Confidence Score: 4/5

This PR is safe to merge with minimal risk - it only modifies test code with a solid synchronization approach
The fix properly addresses the flaky test by replacing timing-based assumptions with explicit synchronization using ReentrantLock. The approach is sound and only affects test code, not production logic. One minor comment error was found but doesn't impact functionality.
No files require special attention - the single test file change is straightforward with proper lock usage

Important Files Changed

File Analysis

Filename	Score	Overview
tests/src/test/scala/com/nvidia/spark/rapids/ResourceBoundedExecutorSuite.scala	4/5	Introduces startup lock mechanism to stabilize flaky test; minor comment error found

Sequence Diagram

sequenceDiagram
    participant Test as Test Thread
    participant Lock as ReentrantLock
    participant Executor as ResourceBoundedExecutor
    participant Queue as Task Queue
    participant Worker as Worker Thread

    Note over Test,Worker: Test Setup Phase
    Test->>Lock: lock() - acquire startup lock
    Test->>Executor: submit unbounded dummy task
    Executor->>Queue: enqueue dummy task
    Test->>Test: sleep(1ms) - ensure dummy task queued first
    
    Note over Test,Worker: Task Submission Phase
    loop For each priority task
        Test->>Executor: submit task with priority
        Executor->>Queue: enqueue task (sorted by priority)
    end
    
    Note over Queue: Tasks queued by priority<br/>but blocked by lock
    
    Note over Test,Worker: Execution Phase
    Test->>Lock: unlock() - release startup lock
    
    Note over Worker: Worker can now start tasks
    loop For each task in priority order
        Worker->>Queue: dequeue highest priority task
        Worker->>Lock: task attempts lock()
        Lock->>Worker: grants lock
        Worker->>Worker: sleep(5ms)
        Worker->>Worker: capture System.nanoTime()
        Worker->>Lock: unlock()
        Worker->>Test: return timestamp via future
    end
    
    Note over Test: Verify timestamps are<br/>monotonically ordered<br/>per priority

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-12-01T01:42:24Z

    }

    // Comprehensive test for task priority and memory usage (including unbounded tasks):
    // Execution order: 1, 4, 6, 2, 5, 3


syntax: Comment mentions task 6, but only 5 tasks are submitted (tasks 1-5). Should be "Execution order: 4, 1, 2, 5, 3"

Suggested change

// Execution order: 1, 4, 6, 2, 5, 3

// Execution order: 4, 1, 2, 5, 3

firestarman · 2025-12-01T01:44:37Z

-    futures += executor.submit(AsyncRunner.newUnboundedTask(buildDummyFn()))
-    results = Array(1, 4, 6, 2, 5, 3).zip(futures).sortBy(_._1).map {
+    lck.unlock() // Allow tasks to start executing.
+    results = Array(2, 3, 5, 1, 4).zip(futures).sortBy(_._1).map {


NIT: missing 6 here compared to the original array.

@firestarman , yes, since I changed the test sample as well. Currently, only 5 samples are included. Description was also updated.

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx · 2025-12-01T01:51:19Z

build

res-life

LGTM

res-life · 2025-12-01T03:04:53Z

LGTM, but we may improve this PR in a follow-up PR.

I have the following concerns:

Tasks with higher priority should be executed first even if they are submitted later.

But the buildDummyFn returns the complete time not the start time.

Maybe we can:
Create the priority single thread pool, then commit a long time(may be 1s) running task.
The long time running task will make sure the followng submitted tasks in the thread pool queue.
Then submit some tasks and the priority thread pool will reorder the tasks in the task queue.
And we only need to record the task start time to verify the priority thread pool works fine.
In this way, we do not need to introduce a lock.
By short: do not verify the first task, only verify the following tasks.

sperlingxx · 2025-12-01T04:31:35Z

LGTM, but we may improve this PR in a follow-up PR.

I have the following concerns:

Tasks with higher priority should be executed first even if they are submitted later.

But the buildDummyFn returns the complete time not the start time.

Maybe we can: Create the priority single thread pool, then commit a long time(may be 1s) running task. The long time running task will make sure the followng submitted tasks in the thread pool queue. Then submit some tasks and the priority thread pool will reorder the tasks in the task queue. And we only need to record the task start time to verify the priority thread pool works fine. In this way, we do not need to introduce a lock. By short: do not verify the first task, only verify the following tasks.

Good idea! Thanks!

sperlingxx added 2 commits December 1, 2025 10:30

fix the unstable case: ResourceBoundedExecutorSuite

2d416a0

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

nit fix

7b566f9

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx requested review from firestarman, pxLi, res-life and thirtiseven December 1, 2025 01:40

pxLi mentioned this pull request Dec 1, 2025

Fix the unstable unit test case: ResourceBoundedExecutorSuite #13906

Closed

greptile-apps Bot reviewed Dec 1, 2025

View reviewed changes

firestarman previously approved these changes Dec 1, 2025

View reviewed changes

pxLi previously approved these changes Dec 1, 2025

View reviewed changes

fix comments

8ba09df

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx dismissed stale reviews from pxLi and firestarman via 8ba09df December 1, 2025 01:49

pxLi approved these changes Dec 1, 2025

View reviewed changes

res-life approved these changes Dec 1, 2025

View reviewed changes

sameerz added bug Something isn't working test Only impacts tests labels Dec 1, 2025

sperlingxx merged commit 3944c6a into NVIDIA:release/25.12 Dec 1, 2025
62 checks passed

sperlingxx deleted the quick_fix_unstable_case_2512 branch December 1, 2025 04:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the unstable unit test case: ResourceBoundedExecutorSuite#13910

Fix the unstable unit test case: ResourceBoundedExecutorSuite#13910
sperlingxx merged 3 commits into
NVIDIA:release/25.12from
sperlingxx:quick_fix_unstable_case_2512

sperlingxx commented Dec 1, 2025

Uh oh!

greptile-apps Bot commented Dec 1, 2025 •

edited

Loading

Uh oh!

greptile-apps Bot left a comment

Uh oh!

greptile-apps Bot Dec 1, 2025

Uh oh!

sperlingxx Dec 1, 2025

Uh oh!

firestarman Dec 1, 2025

Uh oh!

sperlingxx Dec 1, 2025

Uh oh!

sperlingxx commented Dec 1, 2025

Uh oh!

res-life left a comment

Uh oh!

res-life commented Dec 1, 2025

Uh oh!

sperlingxx commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

	// Execution order: 1, 4, 6, 2, 5, 3
	// Execution order: 4, 1, 2, 5, 3

Conversation

sperlingxx commented Dec 1, 2025

Uh oh!

greptile-apps Bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

sperlingxx Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

firestarman Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

sperlingxx Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

sperlingxx commented Dec 1, 2025

Uh oh!

res-life left a comment

Choose a reason for hiding this comment

Uh oh!

res-life commented Dec 1, 2025

Uh oh!

sperlingxx commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

greptile-apps Bot commented Dec 1, 2025 •

edited

Loading