-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Add support to enable user defined custom plan optimizer #11767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adding support of custom optimizer and unit test
Custopt: Set custom optimizer from config file
Adding custom optimizer feature
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla. |
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla. |
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla. |
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla. |
I don't think this is a generic enough solution. Note that the order of optimizer rules matters a lot and each rule has the possibility to impact other rules. As such any change in the rules that aren't already a tested configuration has the possibility to introduce correctness issues. And we're generally trying to move away from the current greedy optimizer to a more exploratory one and such changes would prevent us from moving in that direction without breaking such integrations. I'd instead propose to fork and maintain your copy of PlanOptimizers class with your rule and required tests. In case there are some specific rules you think that would be useful to both you and the general community then maybe we can add those specific rules directly instead of an extension mechanism that brings potential correctness issues. I'll defer to other maintainers as well for their opinion, particularly @martint and @kasiafi. |
Hi @hashhar, the proposed plugin capability is strictly for enabling customized optimizer to run before any built-in ones, so the organizational owners of the platform can enforce certain data policies and other needs by intercepting the user queries and dynamically rewriting/rearranging the user queries. There is no impact on system's built-in optimizers and their execution order. One can view this approach as such: it is equivalent as to force every clients to use same customized client driver from an organization which can dynamically alter the user query based on certain conditions before sending to the query engine. However, the proposed approach is much manageable (no requirement for adopting specific customized client drivers of different languages), and scalable (since it is invoked at query engine side, it can effectively leverage metadata in the DB to make dynamic decision on whether/how to rewrite the user query). It is a critical feature for organization to dynamically enforce right-of-use privacy policy (with audit-ability) based on the usage intent of queried data, as well as, user's consents. Managing a forked code-base should not be a recommended approach IMO. |
The current PlanNodes cannot be exposed to plugins, because they are not in a form that allows us to evolve them backwards-compatibly. While we could still expose them with a caveat that this is not a supportedable feature, a fork is a more suitable place for features we currently cannot support. We don't want to create an impression that a feature is generally useful, if it isn't. And we don't want to have features that are not generally useful. That been said, in the long term we want to have this capability. See "allow connectors to provide optimizer rules" in #18. The project consensus is that this requires a new IR (intermediate representation) for the plan, one that we could expose in the SPI. @ariforu let me close this PR as it won't be merged in current shape. |
Hi @findepi, the current proposed SPI change is to enable dynamically and conditionally rewriting of the user query before system's query-plan optimizations to centrally enforce organization's policies. It is not related to connectors/data-sources specific optimization. Hence, "allow connectors to provide optimizer rules" in #18, doesn't seem to address the needs. Please consult with your team for reconsidering this decision. Thanks! BTW, the similar capability does exist in Spark Query Engine, which we have already been leveraging. |
What kind of policies are you thinking about? In general, the place to enforce policies (access control, filters, masks) is during analysis and initial planning. The job of the optimizer is to produce a more efficient query plan, but it should not change the semantics of the query. |
Hi @martint, yes, we do want to inject query-rewrite at analysis stage (Spark allows to have custom handers injected in analysis, as well as, optimization stages, hence we injected our handler in front of system's analysis pipeline). Our query rewrite (based on query context - use-intent, and metadata of query source targets) will dynamically expand some from-tables to be joined with intent-permitted (based on users' consents), as well as, field-masking/rejecting based on allowed/disallowed intent-usage. In short, we are looking for SPI to enable dynamic custom query-rewrite inside the query engine, while fully leveraging query-engine's native analysis and optimization capabilities. Thanks |
This seems related to
This seems related to |
Thanks @findepi. Yes, the getRowFilters could achieve similar behavior functionally. However, it might not be efficient depending on query type and data size (both source and returned data). For example, the supported filter (ViewExpression) must be a scalar SQL expression of boolean type over the columns in the table, therefore, the conditional filter could look something like: However, for both large table size and large return data set from a complex query, it becomes difficult for query engine to come up a best query plan, since the filter is executed at row level. In the proposed approach, one can simply expend the "policy-controlled" table(s) to be a 'view' through simple join statement as logical filtering. The query engine optimization should be able to produce different optimized query plan based on size of tables involved and query's where-clause filters, etc. It might do the table join(s) first before the original query logic, or apply join(s) afterwards, whatever most efficient. |
That's actually not the case. IN and EXISTS are typically implemented as a combination of semi joins, joins and aggregations (depending on whether they are correlated subqueries or not). I would recommend taking a look at the query plan via EXPLAIN to see if it resembles what you'd be expecting it to produce. If it doesn't, there might be further optimizations we can implement. |
@martint I have sent you a DM over Slack to explain our usecase. I think it'll be easier that way. |
Thanks @martint. Will look into it to see what the resulting query plan looks like. However, how can we obtain a session variable (declared as use intent) inside getRowFilters, since it only passed in SystemSecurityContext and table-name? |
There's currently no way to get session properties in those methods, but that's something we should explore. Can you open a separate issue to track it? |
Thank you, Martin, for the update. |
@ariforu, did your organization find another solution for query re-write in the analysis stage? My team had similar needs (re-write for materialized view matching, custom auth filtering) and I am not seeing a currently supported path other than “do it before trino sees it”. Thanks! |
Description
Unlike Spark, Trino does not currently support user defined query plan optimizers. This feature will add a pluggable optimizer support. Organizations can then write their custom optimizers and loosely couple them using this capability.
New Feature
Core query engine
This feature is not targeted to the end users or non-technical SQL audience. This feature can be thought as a framework that adds support for pluggable custom SQL optimizers. This will eliminate any need to modify Trino's source code when organizations want to add a new SQL optimizer rule beyond the built in ones.
Related issues, pull requests, and links
#11765
Documentation
( ) No documentation is needed.
(x) Sufficient documentation is included in this PR.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: