Replies: 3 comments 6 replies
-
cc @liujiayi771 as I think this is similar to your work in #4663 and others. |
Beta Was this translation helpful? Give feedback.
2 replies
-
I agree. @ted-jenks, do you plan to implement this feature? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In our usage of Gluten we have noticed to key patterns where we feel like we miss out on performance.
The first of these are Spark jobs where a built-in function that is not supported is used, but the plan could be expressed in functions that can be offloaded. For instance
to_timestamp
is not supported, butto_unix_timestamp
is and with a cast, these are equivalent.I have also seen issues where a query plan is nearly convertible to Velox, but not quite, leading to nothing getting offloaded to native.
Building on the
to_timestamp
example, we still probably would not see offloading because the CollapseProject rule will ensure that thecast
andto_unix_timestamp
expressions are in a single operator. If it were just theto_unix_timestamp
in this operator, it would have been offloaded. While this will become less of an issue with time as more of the Spark expressions are supported, it will remain true for all custom expressions and UDFs.These observations give me the idea to write Gluten-aware optimizer rules that adapt the plan to improve its offloadability. Crucially, we could. Optimise for separation of offloadable and not offloadable expressions on an operator by operator basis. Obviously this would have to be done in a way to not introduce a bunch of serialization overhead.
Is this something anyone has thought about before? Do you think potentially introducing additional serialization overheads could hurt overall perf?
Beta Was this translation helpful? Give feedback.
All reactions