You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a follow on issue based on discussions in #1424.
When choosing the smaller side of a join to use for the build-side, we just use the total table size based on the sizeInBytes that was computed in a completed query stage.
We can make some improvements to this approach:
Calculate the resulting hash table size based on the join keys and the columns from the table that will be used in the join. We can compute size based on rowCount * sum(estimated size of each column).
In cases where the input is now a completed query stage, we can look at the HadoopFsRelation contained by the LogicalRelation. From this, we can can sizeInBytes and infer a row count based on this and the estimated schema size
Describe the potential solution
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
What is the problem the feature request solves?
This is a follow on issue based on discussions in #1424.
When choosing the smaller side of a join to use for the build-side, we just use the total table size based on the
sizeInBytes
that was computed in a completed query stage.We can make some improvements to this approach:
Describe the potential solution
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: