-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28549: Limit the maximum number of operators merged by SharedWorkOptimizer #5492
base: master
Are you sure you want to change the base?
Conversation
Quality Gate passedIssues Measures |
final List<TableScanOperator> scans = tableNameToOps.get(tableName); | ||
if (batchSize == -1) { | ||
batches.add(scans); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using continue
keyword is a bad practice
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
Outdated
Show resolved
Hide resolved
ArrayListMultimap<String, TableScanOperator> tableNameToOps, int batchSize) { | ||
if (batchSize == -1) { | ||
return Collections.singletonList(sortedTables.stream().map(Entry::getKey) | ||
.flatMap(tableName -> tableNameToOps.get(tableName).stream()).collect(Collectors.toList())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This puts all TS ops into a single List, regardless of their source. Maybe you intend the following code?
if (batchSize == -1) {
return sortedTables.stream()
.map(entry -> tableNameToOps.get(entry.getKey()))
.collect(Collectors.toList());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I was just confused when I wrote the line. I modified it and ran some qtests which failed
e574161
Quality Gate passedIssues Measures |
What changes were proposed in this pull request?
This PR would limit the maximum number of table scan operators which SWO tries to merge.
https://issues.apache.org/jira/browse/HIVE-28549
Why are the changes needed?
We observed SWO makes a negative impact when it merges too many, e.g. 50, operators. If operators are memory intensive, they might throw OOM or might slow down.
I believe we can resolve OOM with the following patch, but we still want an upper limit so that we can tune concurrency or RAM per operator reasonably.
#5478
Does this PR introduce any user-facing change?
No.
Is the change a dependency upgrade?
No.
How was this patch tested?
I added a qtest