Add support to enable user defined custom plan optimizer #11767

ariforu · 2022-04-04T02:47:00Z

Description

Unlike Spark, Trino does not currently support user defined query plan optimizers. This feature will add a pluggable optimizer support. Organizations can then write their custom optimizers and loosely couple them using this capability.

Is this change a fix, improvement, new feature, refactoring, or other?

New Feature

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Core query engine

How would you describe this change to a non-technical end user or system administrator?

This feature is not targeted to the end users or non-technical SQL audience. This feature can be thought as a framework that adds support for pluggable custom SQL optimizers. This will eliminate any need to modify Trino's source code when organizations want to add a new SQL optimizer rule beyond the built in ones.

Related issues, pull requests, and links

#11765

Documentation

( ) No documentation is needed.
(x) Sufficient documentation is included in this PR.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Section
* Added support  for user defined custom SQL optimizers (https://github.com/trinodb/trino/issues/11765)

Adding support of custom optimizer and unit test

Custopt: Set custom optimizer from config file

Adding custom optimizer feature

Custopt

cla-bot · 2022-04-04T02:47:03Z

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla.

cla-bot · 2022-04-04T06:18:09Z

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla.

cla-bot · 2022-04-04T06:25:54Z

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla.

cla-bot · 2022-04-04T08:29:29Z

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla.

hashhar · 2022-04-05T07:56:46Z

I don't think this is a generic enough solution. Note that the order of optimizer rules matters a lot and each rule has the possibility to impact other rules. As such any change in the rules that aren't already a tested configuration has the possibility to introduce correctness issues.

And we're generally trying to move away from the current greedy optimizer to a more exploratory one and such changes would prevent us from moving in that direction without breaking such integrations.

I'd instead propose to fork and maintain your copy of PlanOptimizers class with your rule and required tests.

In case there are some specific rules you think that would be useful to both you and the general community then maybe we can add those specific rules directly instead of an extension mechanism that brings potential correctness issues.

I'll defer to other maintainers as well for their opinion, particularly @martint and @kasiafi.

yitaoyao-nike · 2022-04-05T20:45:32Z

Hi @hashhar, the proposed plugin capability is strictly for enabling customized optimizer to run before any built-in ones, so the organizational owners of the platform can enforce certain data policies and other needs by intercepting the user queries and dynamically rewriting/rearranging the user queries. There is no impact on system's built-in optimizers and their execution order. One can view this approach as such: it is equivalent as to force every clients to use same customized client driver from an organization which can dynamically alter the user query based on certain conditions before sending to the query engine. However, the proposed approach is much manageable (no requirement for adopting specific customized client drivers of different languages), and scalable (since it is invoked at query engine side, it can effectively leverage metadata in the DB to make dynamic decision on whether/how to rewrite the user query). It is a critical feature for organization to dynamically enforce right-of-use privacy policy (with audit-ability) based on the usage intent of queried data, as well as, user's consents. Managing a forked code-base should not be a recommended approach IMO.

findepi · 2022-04-06T09:58:34Z

The current PlanNodes cannot be exposed to plugins, because they are not in a form that allows us to evolve them backwards-compatibly. While we could still expose them with a caveat that this is not a supportedable feature, a fork is a more suitable place for features we currently cannot support. We don't want to create an impression that a feature is generally useful, if it isn't. And we don't want to have features that are not generally useful.

That been said, in the long term we want to have this capability. See "allow connectors to provide optimizer rules" in #18. The project consensus is that this requires a new IR (intermediate representation) for the plan, one that we could expose in the SPI.
In the shorter term, the progress within #18 is on improving connector pushdown capabilities, and we're having an awesome progress there.

@ariforu let me close this PR as it won't be merged in current shape.
Let's also have a discussion what problem you're trying to solve. Maybe someone can suggest a solution that's easier to maintain and doesn't depend on #18 being fully completed.

yitaoyao-nike · 2022-04-06T16:11:35Z

Hi @findepi, the current proposed SPI change is to enable dynamically and conditionally rewriting of the user query before system's query-plan optimizations to centrally enforce organization's policies. It is not related to connectors/data-sources specific optimization. Hence, "allow connectors to provide optimizer rules" in #18, doesn't seem to address the needs. Please consult with your team for reconsidering this decision. Thanks!

BTW, the similar capability does exist in Spark Query Engine, which we have already been leveraging.

martint · 2022-04-06T16:28:23Z

What kind of policies are you thinking about?

In general, the place to enforce policies (access control, filters, masks) is during analysis and initial planning. The job of the optimizer is to produce a more efficient query plan, but it should not change the semantics of the query.

yitaoyao-nike · 2022-04-06T17:17:28Z

Hi @martint, yes, we do want to inject query-rewrite at analysis stage (Spark allows to have custom handers injected in analysis, as well as, optimization stages, hence we injected our handler in front of system's analysis pipeline). Our query rewrite (based on query context - use-intent, and metadata of query source targets) will dynamically expand some from-tables to be joined with intent-permitted (based on users' consents), as well as, field-masking/rejecting based on allowed/disallowed intent-usage. In short, we are looking for SPI to enable dynamic custom query-rewrite inside the query engine, while fully leveraging query-engine's native analysis and optimization capabilities. Thanks

findepi · 2022-04-06T20:38:23Z

query rewrite (based on query context - use-intent, and metadata of query source targets) will dynamically expand some from-tables to be joined with intent-permitted (based on users' consents)

This seems related to io.trino.spi.security.SystemAccessControl#getRowFilters interface method.

as well as, field-masking/rejecting based on allowed/disallowed intent-usage

This seems related to io.trino.spi.security.SystemAccessControl#getColumnMask

yitaoyao-nike · 2022-04-06T21:52:35Z

Thanks @findepi. Yes, the getRowFilters could achieve similar behavior functionally. However, it might not be efficient depending on query type and data size (both source and returned data). For example, the supported filter (ViewExpression) must be a scalar SQL expression of boolean type over the columns in the table, therefore, the conditional filter could look something like:
t1.xyz in (select xyz from t2) or
exists (select xyz from t2 where xyz = t1.xyz)

However, for both large table size and large return data set from a complex query, it becomes difficult for query engine to come up a best query plan, since the filter is executed at row level.

In the proposed approach, one can simply expend the "policy-controlled" table(s) to be a 'view' through simple join statement as logical filtering. The query engine optimization should be able to produce different optimized query plan based on size of tables involved and query's where-clause filters, etc. It might do the table join(s) first before the original query logic, or apply join(s) afterwards, whatever most efficient.

martint · 2022-04-06T22:01:27Z

However, for both large table size and large return data set from a complex query, it becomes difficult for query engine to come up a best query plan, since the filter is executed at row level.

That's actually not the case. IN and EXISTS are typically implemented as a combination of semi joins, joins and aggregations (depending on whether they are correlated subqueries or not). I would recommend taking a look at the query plan via EXPLAIN to see if it resembles what you'd be expecting it to produce. If it doesn't, there might be further optimizations we can implement.

ariforu · 2022-04-06T23:23:34Z

@martint I have sent you a DM over Slack to explain our usecase. I think it'll be easier that way.

yitaoyao-nike · 2022-04-07T01:05:01Z

Thanks @martint. Will look into it to see what the resulting query plan looks like. However, how can we obtain a session variable (declared as use intent) inside getRowFilters, since it only passed in SystemSecurityContext and table-name?

martint · 2022-04-08T14:34:40Z

There's currently no way to get session properties in those methods, but that's something we should explore. Can you open a separate issue to track it?

yitaoyao-nike · 2022-04-08T17:07:53Z

Thank you, Martin, for the update.

eric-ranstrom-wd · 2023-12-03T16:20:13Z

@ariforu, did your organization find another solution for query re-write in the analysis stage? My team had similar needs (re-write for materialized view matching, custom auth filtering) and I am not seeing a currently supported path other than “do it before trino sees it”. Thanks!

ariforu and others added 27 commits February 20, 2022 22:11

Adding support of custom optimizer and unit test

41bf762

Adding more documentation and fixing the unit test

d26a93d

Merge pull request #1 from ariforu/custopt

b386729

Adding support of custom optimizer and unit test

Trying to set the customer optimizer is allowed and class name from test

33703cd

Trying to set the customer optimizer is allowed and class name from test

ca5d03d

Removed the changes not required

60f68b8

Removed the changes not required

ee4fbca

Removed the unessary changes from FeaturesConfig, this is not required.

c833e50

Removed the unessary changes from FeaturesConfig, this is not required.

05ed781

Removed the unessary changes from FeaturesConfig, this is not required.

c4723e5

Merge pull request #2 from ariforu/custopt

f675e8d

Custopt: Set custom optimizer from config file

reverting default configs

f26cab6

fixed airlift config read

59178b4

unit tests working

df859be

Removed unwanted properties

63d4e64

Added more unit test

8312760

Merge branch 'master' into custopt

2994d86

Merge pull request #3 from ariforu/custopt

b6f69f3

Adding custom optimizer feature

Merge branch 'trinodb:master' into master

2591201

adding license information

7f10bb8

fixing checkstyle issues

a77189b

fixing checkstyle issues

9f5b09e

Merge branch 'trinodb:master' into custopt

a450d0f

Merge branch 'trinodb:master' into master

8e352a1

Merge remote-tracking branch 'origin/custopt' into custopt

2b5090b

Log message and javadoc added

5a8a2f4

Merge pull request #5 from ariforu/custopt

797a13e

Custopt

Added the docs to register custom optmizer class

8e7acd7

Added the docs to register custom optmizer class

18854b5

github-actions bot added the docs label Apr 4, 2022

Fix the build

81ea9e2

ariforu added enhancement New feature or request cla-signed labels Apr 4, 2022

ariforu marked this pull request as ready for review April 4, 2022 19:54

rranjaInfy requested a review from hashhar April 5, 2022 07:29

findepi closed this Apr 6, 2022

findepi mentioned this pull request Apr 6, 2022

Add support to enable user defined custom plan optimizer #11765

Closed

Add support to enable user defined custom plan optimizer #11767

Add support to enable user defined custom plan optimizer #11767

Uh oh!

Conversation

ariforu commented Apr 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues, pull requests, and links

Documentation

Release notes

Uh oh!

cla-bot bot commented Apr 4, 2022

Uh oh!

cla-bot bot commented Apr 4, 2022

Uh oh!

cla-bot bot commented Apr 4, 2022

Uh oh!

cla-bot bot commented Apr 4, 2022

Uh oh!

hashhar commented Apr 5, 2022

Uh oh!

yitaoyao-nike commented Apr 5, 2022

Uh oh!

findepi commented Apr 6, 2022

Uh oh!

yitaoyao-nike commented Apr 6, 2022

Uh oh!

martint commented Apr 6, 2022

Uh oh!

yitaoyao-nike commented Apr 6, 2022

Uh oh!

findepi commented Apr 6, 2022

Uh oh!

yitaoyao-nike commented Apr 6, 2022

Uh oh!

martint commented Apr 6, 2022

Uh oh!

ariforu commented Apr 6, 2022

Uh oh!

yitaoyao-nike commented Apr 7, 2022

Uh oh!

martint commented Apr 8, 2022

Uh oh!

yitaoyao-nike commented Apr 8, 2022

Uh oh!

eric-ranstrom-wd commented Dec 3, 2023

Uh oh!

Uh oh!

ariforu commented Apr 4, 2022 •

edited

Loading