Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make PlanBuilder a public API #11383

Open
pedroerp opened this issue Oct 30, 2024 · 3 comments
Open

Make PlanBuilder a public API #11383

pedroerp opened this issue Oct 30, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request external-apis

Comments

@pedroerp
Copy link
Contributor

pedroerp commented Oct 30, 2024

Description

PlanBuilder is a utility created to make it more convenient to create query plans in Velox when writing unit tests. Given its convenience, with time client codebases started using PlanBuilder, even though it should not technically be part of the library's external API.

Since we would like to provide a programmatic API for users to generate plans (instead of having to assemble the plan nodes together), we should move PlanBuilder out of the test directory into velox/exec and have first-class support for its APIs.

The complication is that PlanBuilder today provides support for parsing SQL expressions using DuckDB's parser when building plans, which is only meant to be used for testing code. To decouple this code, I suggest the following steps:

  1. Move the basic functionality of PlanBuilder (everything minus SQL parsing) out of tests, into a PlanBuilderBase base class, refactoring the current PlanBuilder in tests to leverage it. PlanBuilder (test) should extend PlanBuilderBase by adding SQL parsing capabilities to it.
  2. Rename the new exec/PlanBuilderBase to exec/PlanBuilder, and the old exec/tests/utils/PlanBuilder (in test) to exec/tests/utils/PlanBuilderTest.
  3. Create a programmatic ways to let users create expressions trees without relying on a full blown SQL parser (i.e., an ExpressionBuilder class).

Cc: @mbasmanova @xiaoxmeng @majetideepak @kgpai @Yuhta @bikramSingh91

@pedroerp pedroerp added enhancement New feature or request external-apis labels Oct 30, 2024
@pedroerp pedroerp changed the title Make PlanBuilder public Make PlanBuilder a public API Oct 30, 2024
@pedroerp pedroerp self-assigned this Oct 30, 2024
pedroerp added a commit to pedroerp/velox-1 that referenced this issue Nov 4, 2024
Summary:
Adding an API to allow developers to fluently create (untyped)
expression trees without having to rely on a SQL parser. Details and
extensively examples provided in code comments and unit tests.

Part of facebookincubator#11383

Differential Revision: D65371064
@mbasmanova
Copy link
Contributor

@pedroerp PlanBuilder is a convenient utility and has many applications in non-production use cases. It is great for testing, prototyping, ad-hoc debugging. However, I do not need how it is applicable to production code. What are some of the production use cases you have in mind?

PlanBuilder's main advantage is support for SQL snippets that can be used to specify filter/project expressions, aggregate function calls with sorting and masking, window function expressions. Without SQL support there is not much value in using PlanBuilder vs. PlanNode API. The remaining convenience comes from using conventions to simplify the API. If these conventions are generally applicable to production code, then, perhaps, we could look into simplifying PlanNode.h API to incorporate these.

@pedroerp
Copy link
Contributor Author

pedroerp commented Nov 6, 2024

@mbasmanova we have been getting feedback from the community that Velox is usually very nice and efficient, but the entry bar for developers building new applications may be a bit too high, to a point where new users may end up using some other projects like DataFusion which are easier to reason about and more convenient to use.

One of the areas discussed is that creating the plan nodes manually, although being the main API, is complicated and requires a reasonable amount of boilerplate, to the extent that users end up taking a dependency on PlanBuilder (even though it's test code). We have many examples of internal projects doing that,

So the intent here is making PlanBuilder as part of the external API of the library, so that users can actually rely on that more safely. This would also potentially help us decrease the surface area of our external APIs, and hide some of the current APIs underneath PlanBuilder (in the long term).

PlanBuilder's main advantage is support for SQL snippets

That is definitely a nice feature, but the entire builder API, table scan builder, plan id generators, and such make generating plans (and understanding how to generate them) a lot easier and more accessible.

@majetideepak
Copy link
Collaborator

With this feature, users can write applications without depending on DuckDB for the SQL parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request external-apis
Projects
None yet
Development

No branches or pull requests

3 participants