Gluten Plan Optimization #4625

ted-jenks · 2024-02-02T11:19:24Z

ted-jenks
Feb 2, 2024

In our usage of Gluten we have noticed to key patterns where we feel like we miss out on performance.

The first of these are Spark jobs where a built-in function that is not supported is used, but the plan could be expressed in functions that can be offloaded. For instance to_timestamp is not supported, but to_unix_timestamp is and with a cast, these are equivalent.

I have also seen issues where a query plan is nearly convertible to Velox, but not quite, leading to nothing getting offloaded to native.

Project [cast(to_unix_timestamp(col1#86, yyyy-MM-dd, Some(UTC), false) as timestamp) AS col1#89]   
   +- RelationV2[col1#86] csv ...

Building on the to_timestamp example, we still probably would not see offloading because the CollapseProject rule will ensure that the cast and to_unix_timestamp expressions are in a single operator. If it were just the to_unix_timestamp in this operator, it would have been offloaded. While this will become less of an issue with time as more of the Spark expressions are supported, it will remain true for all custom expressions and UDFs.

These observations give me the idea to write Gluten-aware optimizer rules that adapt the plan to improve its offloadability. Crucially, we could. Optimise for separation of offloadable and not offloadable expressions on an operator by operator basis. Obviously this would have to be done in a way to not introduce a bunch of serialization overhead.

Is this something anyone has thought about before? Do you think potentially introducing additional serialization overheads could hurt overall perf?

ted-jenks · 2024-02-06T09:00:10Z

ted-jenks
Feb 6, 2024
Author

cc @ulysses-you @zhouyuan @PHILO-HE

3 replies

ulysses-you Feb 6, 2024
Collaborator

thank you @ted-jenks. Do you mean to seperate project into twos, one run on native and the other run on vanilla spark ? If so, the idea looks good to me, it is similar with what we did recently that pull out project from aggregate/sort/topk etc. We can add one more rule to pull out a project from a project.

ulysses-you Feb 6, 2024
Collaborator

By the way, these days are Chinese holiday, we may reply late.

ted-jenks Feb 6, 2024
Author

No worries! Hope you are having happy holidays! I will take a look into doing this over the coming days.

ted-jenks · 2024-02-06T14:32:39Z

ted-jenks
Feb 6, 2024
Author

cc @liujiayi771 as I think this is similar to your work in #4663 and others.

2 replies

liujiayi771 Feb 7, 2024
Collaborator

Your idea is good to me. We should try to minimize the number of expressions that fallback to evaluation in vanilla Spark. But how do you plan to break down the expressions within Project? Do we need to traverse each expression for validation individually, or is there a way to validate them in batch?

ted-jenks Feb 7, 2024
Author

Currently I do not know a way to avoid traversing each expression, which would look like mapping over the expressions in the Project, and the content of the children of those expressions.

PHILO-HE · 2024-02-23T07:38:05Z

PHILO-HE
Feb 23, 2024
Collaborator

While this will become less of an issue with time as more of the Spark expressions are supported, it will remain true for all custom expressions and UDFs.

I agree. @ted-jenks, do you plan to implement this feature?

1 reply

ted-jenks Mar 12, 2024
Author

I wanted to but have been unable to find the time to commit to it. Cannot promise it too soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gluten Plan Optimization #4625

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Gluten Plan Optimization #4625

ted-jenks Feb 2, 2024

Replies: 3 comments · 6 replies

ted-jenks Feb 6, 2024 Author

ulysses-you Feb 6, 2024 Collaborator

ulysses-you Feb 6, 2024 Collaborator

ted-jenks Feb 6, 2024 Author

ted-jenks Feb 6, 2024 Author

liujiayi771 Feb 7, 2024 Collaborator

ted-jenks Feb 7, 2024 Author

PHILO-HE Feb 23, 2024 Collaborator

ted-jenks Mar 12, 2024 Author

ted-jenks
Feb 2, 2024

Replies: 3 comments 6 replies

ted-jenks
Feb 6, 2024
Author

ulysses-you Feb 6, 2024
Collaborator

ulysses-you Feb 6, 2024
Collaborator

ted-jenks Feb 6, 2024
Author

ted-jenks
Feb 6, 2024
Author

liujiayi771 Feb 7, 2024
Collaborator

ted-jenks Feb 7, 2024
Author

PHILO-HE
Feb 23, 2024
Collaborator

ted-jenks Mar 12, 2024
Author