-
Couldn't load subscription status.
- Fork 5.5k
feat: Add InExpression for Prestissimo Index Join #26407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's GuideThis PR extends index join optimization to support SQL IN predicates by updating both the Java planner and the native execution path to recognize IN expressions as lookup conditions, and adds unit tests across Java and C++ to validate the new behavior. Sequence diagram for index join optimization with IN predicatesequenceDiagram
participant Planner as "IndexJoinOptimizer"
participant Expr as "CallExpression (IN)"
participant Context as "LookupContext"
Planner->>Expr: Check getDisplayName() == "IN"
Expr-->>Planner: Return arguments
Planner->>Expr: Check isVariable(arguments[0])
Expr-->>Planner: Return true/false
alt arguments[0] is variable
Planner->>Context: Add arguments[0] to lookupVariables
end
Class diagram for updated IndexJoinOptimizer logicclassDiagram
class IndexJoinOptimizer {
+optimizeJoin()
+isVariable(Expression): boolean
+getLookupVariables()
}
class CallExpression {
+getDisplayName(): String
+getArguments(): List<Expression>
}
class VariableReferenceExpression {
}
IndexJoinOptimizer --> CallExpression
IndexJoinOptimizer --> VariableReferenceExpression
CallExpression --> Expression
%% Highlight new IN support
IndexJoinOptimizer : +Handles CONTAINS expressions
IndexJoinOptimizer : +Handles IN expressions (new)
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `presto-tests/src/test/java/com/facebook/presto/tests/TestNativeIndexJoinLogicalPlanner.java:147-145` </location>
<code_context>
filter(tableScan("lineitem")),
indexSource("orders"))));
+ assertPlan("" +
+ "SELECT *\n" +
+ "FROM (\n" +
+ " SELECT *\n" +
+ " FROM lineitem\n" +
+ " WHERE partkey % 8 = 0) l\n" +
+ joinType + " JOIN orders o\n" +
+ " ON l.orderkey = o.orderkey\n" +
+ " AND o.custkey IN (1, l.partkey, 3)\n",
+ anyTree(indexJoin(
+ filter(tableScan("lineitem")),
+ indexSource("orders"))));
+
assertPlan("" +
</code_context>
<issue_to_address>
**suggestion (testing):** Missing negative test cases for IN expressions in index joins.
Please add tests to cover cases where IN expressions should not result in index joins, such as with non-constant arrays, unsupported types, or empty arrays.
Suggested implementation:
```java
assertPlan("" +
"SELECT *\n" +
"FROM (\n" +
" SELECT *\n" +
" FROM lineitem\n" +
" WHERE partkey % 8 = 0) l\n" +
joinType + " JOIN orders o\n" +
" ON l.orderkey = o.orderkey\n" +
" AND o.custkey IN (1, l.partkey, 3)\n",
anyTree(indexJoin(
filter(tableScan("lineitem")),
indexSource("orders"))));
// Negative test: IN list is a column reference (non-constant array)
assertPlan("" +
"SELECT *\n" +
"FROM lineitem l\n" +
joinType + " JOIN orders o\n" +
" ON l.orderkey = o.orderkey\n" +
" AND o.custkey IN (l.partkey)\n",
anyTree(
join(
tableScan("lineitem"),
tableScan("orders"))));
// Negative test: IN list contains unsupported type (e.g., VARCHAR)
assertPlan("" +
"SELECT *\n" +
"FROM lineitem l\n" +
joinType + " JOIN orders o\n" +
" ON l.orderkey = o.orderkey\n" +
" AND o.custkey IN ('a', 'b', 'c')\n",
anyTree(
join(
tableScan("lineitem"),
tableScan("orders"))));
// Negative test: IN list is empty
assertPlan("" +
"SELECT *\n" +
"FROM lineitem l\n" +
joinType + " JOIN orders o\n" +
" ON l.orderkey = o.orderkey\n" +
" AND o.custkey IN ()\n",
anyTree(
join(
tableScan("lineitem"),
tableScan("orders"))));
```
- If your test framework requires a specific matcher to assert the absence of an index join, you may need to adjust `anyTree(join(...))` to match your conventions.
- If the `joinType` variable is not defined for these negative cases, you may need to specify the join type explicitly (e.g., "INNER", "LEFT", etc.).
- If the empty IN list syntax (`IN ()`) is not supported by your SQL parser, you may need to use a workaround such as `IN (NULL)` or another method to simulate an empty list.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
currently added logic for both rewrite CONTAINS to IN (will be a part of SuperExpression and matched before normal non-equal expressions like CONTAINS), and IN (as a regular expression, so query is written as key IN xx). Happy to remove the regular IN case if that's not needed. |
|
Thanks for the release note! Nit of formatting: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG. Thanks for adding it!
Description
Added support for SQL IN operator in IndexJoinOptimizer to enable index join optimization for IN predicates in join conditions. This change covers both the Java planner (IndexJoinOptimizer.java) and the native execution path (PrestoToVeloxQueryPlan.cpp).
Changes include:
IndexJoinOptimizer.javato recognizeSpecialFormExpression.Form.INand extract lookup variables from IN expressionsisIn()helper function and logic inPrestoToVeloxQueryPlan.cppto convert IN expressions toInIndexLookupConditionTestNativeIndexJoinLogicalPlanner.java,TestSequenceStorageLogicalPlanner.java, andPrestoToVeloxQueryPlanTest.cppMotivation and Context
Presto now supports
RewriteConstantArrayContainsToInExpressionthat convertsCONTAINS(ARRAY[...], column)into SQL IN expressions when the array contains only constants. However, IndexJoinOptimizer previously only recognized CONTAINS expressions, not IN expressions.This meant that:
o.custkey IN (1, 2, 3)) were not eligible for index join optimizationThis change ensures that both direct SQL IN and rewritten CONTAINS→IN expressions can benefit from index join optimization.
Impact
Performance: Queries using IN predicates in join conditions can now leverage index joins, potentially improving performance significantly for queries with selective IN predicates on indexed columns.
User-facing: Users can now write cleaner SQL using the IN operator directly instead of the less intuitive
CONTAINS(ARRAY[...], column)syntax, while still getting index join optimization.API: No public API changes. This is an internal optimization enhancement.
Test Plan
Added unit tests in
TestNativeIndexJoinLogicalPlanner.javacovering:Added unit tests in
TestSequenceStorageLogicalPlanner.javacovering:rewrite_constant_array_contains_to_in_expressionenabled to verify CONTAINS→IN conversion works correctlyAdded C++ unit test in
PrestoToVeloxQueryPlanTest.cppto verify parsing of IndexJoinNode with IN operatorManually tested on production-like queries to verify:
All tests pass successfully.
Contributor checklist
20251022_050924_00006_mqhze (with rewrite = false) -> index join
Release Notes
== NO RELEASE NOTE ==