-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task::next()
Deadlocks with Pipeline Dependencies in Single-threaded Execution
#11442
Comments
Probably not a problem with |
In our scenario, we employ Gluten + Velox with custom operators preceding the join build; the caller of Task::next() begins to wait upon encountering a block in the build pipeline for the first time, yet it requires both the build and probe pipelines to be class CustomOperator : public Operator {
protected:
// States for async input processing
bool noMoreInput_{false};
mutable std::mutex mutex_;
folly::Future<folly::Unit> inputProcessingTask_; // Key field for blocking
folly::Executor* executor_{nullptr};
public:
// Called when operator needs to process input asynchronously
void startAsyncInputProcessing(RowVectorPtr input) {
// Create a promise/future pair to track async processing
folly::Promise<folly::Unit> promise;
inputProcessingTask_ = promise.getFuture(); // Future becomes valid here
// Schedule async task
auto task = [this, input = std::move(input), promise = std::move(promise)]() {
// Process input asynchronously...
// When done, fulfill the promise
promise.setValue();
};
executor_->add(std::move(task));
}
// Check if operator is blocked
BlockingReason isBlockedDefault(ContinueFuture* future) {
std::lock_guard<std::mutex> lock(mutex_);
if (inputProcessingTask_.valid()) {
// We're blocked waiting for async processing
if (future) {
*future = std::move(inputProcessingTask_);
}
return BlockingReason::kWaitForConnector;
}
return BlockingReason::kNotBlocked;
}
// Add new input
void addInputDefault(RowVectorPtr input) {
std::lock_guard<std::mutex> lock(mutex_);
if (inputProcessingTask_.valid()) {
VELOX_FAIL("Cannot add input while processing previous input");
}
startAsyncInputProcessing(std::move(input));
}
}; |
Probe is waiting on build, and build is waiting on |
Yes, I think the issue is that when |
I see, we should only wait on futures for external blocking, and probe is an internal blocking and should not be part of the list. @xiaoxmeng Any idea how we can differentiate external blocking vs internal blocking futures in this case? |
Bug description
Current Behavior
In Velox's single-threaded execution mode, Task::next() uses folly::collectAll to wait for all blocked drivers, even when some drivers are waiting for others in the same task.
Specific example with hash join:
Expected Behavior
The implementation should:
Technical Details
The bug occurs in Task::next():
Proposed Solution
Replace collectAll with collectAny to allow drivers to make progress independently
System information
I'm using an old version of Velox, but I checked the code for `Task::next(), and it is unchanged.
Velox System Info v0.0.2
Commit: 5d315fb
CMake Version: 3.28.3
System: Linux-6.8.0-1017-gcp
Arch: x86_64
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 11.4.0
C Compiler: /usr/bin/cc
C Compiler Version: 11.4.0
CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.10/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt
Relevant logs
No response
The text was updated successfully, but these errors were encountered: