The CQL Transpiler is a tool for translating Clinical Quality Language (CQL) to its SQL equivalence. The transpiler itself doesn't actually produce SQL but instead translates each ELM operator into DBT/Jinja macro operator equivalents. These macro operators are translated to a given SQL dialect through a separate DBT transpiler target package imported into the implementor's solution. Spark SQL is the first planned transpiler target but the development and use of other targets is possible.
This project is still very much in its infancy but is hoped to be used for VA's 2024 Joint Commission submission.
The transpiler workflow is as follows
- CQL text files are loaded into memory
- CQL text files are converted into CQL Abstract Syntax Trees (ASTs) in memory
- CQL ASTs are converted into a "generic" AST that can be used to output code in any query language.
- The generic ASTs are rendered in text parsable by jinja
- The generic ASTs are rendered as DBT models.
- Replace the contents of the resources/cql folder with the CQL libraries to be transpiled.
- Run the main function of gov.va.transpiler.jinja.Transpiler
- The java portion of the transpiler will create intermediate ASTs as jinja files for the original CQL file and its dependencies in the "resources/jinja/generated_intermediate/" folder.
- The java portion of the transpiler will create model files as jinja files for each expression in the original CQL file in the "resources/jinja/generated_models/" folder.
- Run the "render_models.py" script.
- The python/jinja portion of the transpiler will create model files as dbt files for each expression in the original CQL file in the "resources/transpiler_dbt_models/" folder.
- Copy the dbt files into your dbt repository.
- Implement any implementation/database-specific macro files within your dbt repository.
- Execute the normal workflow for your dbt repository.
The "transpiler" module and specifically the main function "Transpiler" class are the entry point into the java portion of the transpiler.
The "render_models.py" script is the entry point into the jinja portion of the transpiler.
Jinja macros files should implement transpilation logic using an object-oriented programming paradigm. Jinja does not support object-oriented programming natively, but it can be hacked to allow it through the clever use of namespaces. To create an an object's "constructor", create a macro function that sets certain variables on a namespace. To create an object that "inherits" from another object, in its constructor, run the parent constructor on the namespace. To create a function that returns one or more values, create a macro that accepts a "carrier" namespace and sets its return values inside that namespace without printing anything.
The jinja AST rendering process does not assume any particular behavior from the State class. The State class exists for the macro file's use.
The Operator class must support the following members:
+ operator - namespace (extends from Operator)
+ children - list of operatorValues
Classes extending from the operator class should support other data passed on from the CQL version of the operators.
The OperatorHandler class must support the following members:
print(state, operatorValues) - void macro + state - namespace (nullable) + operatorValues - dictionary
The operatorValues dictionary should be treated as essentially a class, that always contains at least the following members: + operator - namespace (extends from Operator) + children - list of operatorValues
The macros file that targets SparkSQL implements its behavior using the following classes and standards.
State - namespace + context - String (scoped) + functionArguments - Dictionary<String, Dictionary<String, ?>> (scoped) + aliasContext - String
note: "scoped" variables should be stored and reset before descending into any branch of the AST where they're modified. To do so, use the following pattern:
{%- set previousVariableName = state.variableName %} {%- set state.variableName = newValue %} {{ function(state) }} {%- set state.variableName = previousVariableName %}
DataType is an enum.
Values: SIMPLE = "simple" ENCAPSULATED = "encapsulated" TABLE = "table" INHERITED = "inherited" STATEMENT = "statement" UNKNOWN = "unknown"
Operator - namespace + getDataType(carrier, arguments) - macro (returns DataType) + carrier - namespace + return - string + arguments - dictionary extending OperatorValues + print(state, arguments) + state - State + arguments - dictionary extending OperatorValues
Potential data types:
print(state, operatorValues) - macro + state - namespace (nullable) + operatorValues - dictionary