Skip to content

[SPARK-56330][CORE][FOLLOWUP] Use dedicated loop for TaskInterruptListeners to avoid blocking completion listeners#55292

Closed
cloud-fan wants to merge 1 commit intoapache:masterfrom
cloud-fan:SPARK-56330-followup
Closed

[SPARK-56330][CORE][FOLLOWUP] Use dedicated loop for TaskInterruptListeners to avoid blocking completion listeners#55292
cloud-fan wants to merge 1 commit intoapache:masterfrom
cloud-fan:SPARK-56330-followup

Conversation

@cloud-fan
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Followup to #55151.

Changed invokeTaskInterruptListeners in TaskContextImpl to use a dedicated loop instead of routing through the shared invokeListeners() method. Also removed the now-unnecessary markTaskFailedOnListenerError parameter from invokeListeners since it is always true (only completion and failure listeners use this method now).

Why are the changes needed?

The current implementation routes TaskInterruptListeners through invokeListeners(), which uses listenerInvocationThread to serialize all listener invocations (completion, failure, and interrupt). This creates a problem when markInterrupted() is called on the kill thread while completion listeners need to run on the task thread:

  1. The kill thread enters invokeListeners() for interrupt callbacks and acquires listenerInvocationThread.
  2. While the interrupt listener runs, the task thread completes and calls markTaskCompleted()invokeTaskCompletionListeners()invokeListeners() for completion callbacks.
  3. invokeListeners() sees listenerInvocationThread is held by the kill thread and returns immediately — silently skipping all completion listeners.

This causes resource leaks because cleanup logic registered via addTaskCompletionListener (e.g., closing file handles, releasing caches, freeing native memory) never executes.

The fix uses a dedicated loop for interrupt listeners with independent serialization (synchronized access to the callback stack, but no listenerInvocationThread gate), so interrupt listeners and completion/failure listeners can run on different threads without blocking each other.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added a regression test "SPARK-56330: task completion during interrupt listener execution" that:

  1. Registers a completion listener and a blocking interrupt listener.
  2. Interrupts the task from a separate thread and waits for the interrupt listener to start.
  3. Marks the task completed on the main thread while the interrupt listener is still running.
  4. Verifies the completion listener was called (it would be silently skipped with the shared invokeListeners() approach).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

…teners to avoid blocking completion listeners

Followup to apache#55151.

The original implementation routes TaskInterruptListeners through the shared
invokeListeners() method, which uses listenerInvocationThread to serialize all
listener invocations. This creates a problem when markInterrupted() is called
on the kill thread while completion listeners need to run on the task thread:
the task thread sees listenerInvocationThread held by the kill thread and
returns immediately, silently skipping all completion listeners.

This can cause resource leaks because cleanup logic registered via
addTaskCompletionListener (e.g., closing file handles, releasing caches,
freeing native memory) never executes.

Fix: use a dedicated loop for interrupt listeners that does not share
listenerInvocationThread with completion/failure listeners. Exceptions are
still collected and thrown as TaskCompletionListenerException so markInterrupted
can catch and record the aggregate failure.
@cloud-fan
Copy link
Copy Markdown
Contributor Author

cc @sadikovi @HyukjinKwon

Copy link
Copy Markdown

@sadikovi sadikovi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix!

@HyukjinKwon
Copy link
Copy Markdown
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants