Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Derived Dataset Column Definitions #51

Open
bcodell opened this issue Dec 24, 2023 · 0 comments
Open

Add Derived Dataset Column Definitions #51

bcodell opened this issue Dec 24, 2023 · 0 comments

Comments

@bcodell
Copy link
Owner

bcodell commented Dec 24, 2023

Enable developers to define dataset columns that represent transformations of 1+ other dataset columns.

The actual aql might look like the following:

{% set aql %}
using customer_stream
select all activity_1 (
customer_id as customer_id,
activity_at as activity_1_at
)
append first after activity_2 (
activity_at as activity_2_at
)
derive (
datediff('d', ${activity_1_at}, ${activity_2_at}) as time_to_activity_2_days
)
{% endset %}

The resulting dataset schema should be:

  • customer_id (str)
  • activity_1_at (ts)
  • activity_2_at (ts)
  • time_to_activity_2_days (float)

Open questions:

  • How to identify the data type of the derived column? first-level dataset columns can be inferred because the data type of the attribute and any aggregation function applied are both known, but arbitrary sql can (and should) be used in defining these transformations
  • How to identify multiple derived columns? Currently columns are parsed based on the logic that a comma is only expected at the end of the column alias, but arbitrary sql (which include commas) will be used, which will break the aforementioned parsing logic
  • How to apply aggregations to derived columns?
    • Not supported for now - need to figure out base dataset aggregation workflow semantics
  • How necessary are these features in aql, if the goal is interfacing in a BI layer?
    • Very - need a code-centric interface to enable automated maintenance/upkeep of dataset columns as they are canonized
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant