Skip to content

Full support for multithreaded applications #641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 64 commits into from
Oct 6, 2020

Conversation

AFFogarty
Copy link
Contributor

@AFFogarty AFFogarty commented Aug 25, 2020

Design

This PR enables multi-threaded user applications. The PR accomplishes this by creating a 1-to-1 mapping between .NET Thread objects and JVM single-thread ExecutorService objects. ExecutorService is a Java interface that wraps a worker thread with its own task queue.

When a .NET thread calls a Spark function, the JvmBridge includes the thread's ManagedThreadId in the payload body. When the request is received on the JVM side, DotnetBackendHandler uses the ManagedThreadId to map the request onto its corresponding ExecutorService. The ExecutorService then executes the function on its worker thread.

This pattern guarantees that Spark functions called by a particular .NET thread will always be executed on the same JVM thread. In addition, two different .NET threads will always execute Spark functions on two different JVM threads.

An object called ThreadPool manages the lifecycle for ExecutorService objects on the JVM side. A corresponding class called JvmThreadPool manages the threads on the .NET side, and does periodic garbage collection.

Additional features

This PR also enables GetActiveSession, SetActiveSession, and ClearActiveSession APIs on SparkSession. These APIs were previously unavailable because thread-local sessions were unsupported.

Testing

This PR adds a new E2E test class JvmThreadPoolTests that tests the basic functionality of the thread pool.

Since these changes are on the code path for all Spark API calls, these changes are exercised by all existing E2E tests.

Limitations

The only limitation is that in Spark, ActiveSession gives a guarantee that a child thread will inherit its parent's active session. My design does not support that. However, I think that this is not a problem since .NET's thread model doesn't support parent/child relationships between threads.

This PR will address bug #333

@imback82 imback82 added the enhancement New feature or request label Sep 12, 2020
@imback82 imback82 added this to the 1.0.0 milestone Sep 12, 2020
@suhsteve suhsteve mentioned this pull request Sep 15, 2020
7 tasks
@AFFogarty AFFogarty marked this pull request as ready for review September 22, 2020 22:44
@AFFogarty AFFogarty changed the title [WIP] Full support for multithreaded applications Full support for multithreaded applications Sep 22, 2020
suhsteve
suhsteve previously approved these changes Oct 2, 2020
/// (first created) context.
/// </summary>
/// <param name="session">SparkSession object</param>
public static void SetActiveSession(SparkSession session) =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the version attribute [Since(Versions.V3_0_0)] for this API?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind this comment, got confused by #633 and Pyspark but I see this has been exposed by Spark since 2.0.0 so please ignore this comment.

@Niharikadutta
Copy link
Collaborator

LGTM. Thanks @AFFogarty !

@AFFogarty AFFogarty requested review from imback82 and suhsteve October 3, 2020 19:07
Copy link
Contributor

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nit/minor comments, but looking good to me!

def run(threadId: Int, task: () => Unit): Unit = {
val executor = getOrCreateExecutor(threadId)
val future = executor.submit(new Runnable {
override def run(): Unit = task()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for verifying this!

@AFFogarty AFFogarty requested a review from imback82 October 5, 2020 20:14
imback82
imback82 previously approved these changes Oct 5, 2020
Copy link
Contributor

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit comment, but LGTM, thanks @AFFogarty!

}
}

_loggerService.LogDebug("JVM thread GC complete.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this log and line 130? (We don't have a good logger service, so this will always be printed). I think line 111 should be good enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. Also fixed a bug where that function was always returning false.

@imback82 imback82 merged commit a67ad59 into dotnet:master Oct 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: APIs utilizing SparkSession.getActiveSession() are broken
4 participants