[SPARK-53996][SQL] Improve InferFiltersFromConstraints to infer filters from complex join expressions #52699
+74
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR improves the Spark SQL optimizer’s
InferFiltersFromConstraintsrule to infer filter conditions from join constraints that involve complex expressions, not just simple attribute equalities.Currently, the optimizer can only infer additional constraints when the join condition is a simple equality (e.g.,
a = b). For more complex expressions, such as arithmetic operations, it does not infer the corresponding filter.Example (currently works as expected):
In this case, the optimizer correctly infers the additional constraint
t1.a = 1.Example (now handled by this PR):
Here, it is clear that
t1.a = 3(sincet2.b = 1andt1.a = t2.b + 2), but previously the optimizer did not infer this constraint. With this change, the optimizer can now deduce and push downt1.a = 3.How was this patch tested?
You can reproduce and verify the improvement with the following:
Before this change, the physical plan does not include the inferred filter:
With this PR, the optimizer should infer and push down
t2.b = 3as an additional filter.Why are the changes needed?
Without this enhancement, the optimizer cannot push down filters or optimize query execution plans for queries with complex join conditions, which can lead to suboptimal join performance.