Skip to content

Commit 7e6a19f

Browse files
authored
PPL to Spark translation (#33)
* adding support for containerized flint with spark / Livy docker-compose.yml Signed-off-by: YANGDB <[email protected]> * adding support for containerized flint with spark / Livy docker-compose.yml Signed-off-by: YANGDB <[email protected]> * adding support for containerized flint with spark / Livy docker-compose.yml Signed-off-by: YANGDB <[email protected]> * adding support for containerized flint with spark / Livy docker-compose.yml Signed-off-by: YANGDB <[email protected]> * update ppl ast builder Signed-off-by: YANGDB <[email protected]> * add ppl ast components add ppl statement logical plan elements add ppl parser components add ppl expressions components Signed-off-by: YANGDB <[email protected]> * populate ppl test suit for covering different types of PPL queries Signed-off-by: YANGDB <[email protected]> * update additional tests Signed-off-by: YANGDB <[email protected]> * separate ppl-spark code into a dedicated module Signed-off-by: YANGDB <[email protected]> * add ppl translation of simple filter and data-type literal expression Signed-off-by: YANGDB <[email protected]> * remove none-used ppl ast builder Signed-off-by: YANGDB <[email protected]> * add log-plan test results validation Signed-off-by: YANGDB <[email protected]> * add support for multiple table selection using union Signed-off-by: YANGDB <[email protected]> * add support for multiple table selection using union Signed-off-by: YANGDB <[email protected]> * update sbt with new IT test suite for PPL module Signed-off-by: YANGDB <[email protected]> * update ppl IT suite test Signed-off-by: YANGDB <[email protected]> * update ppl IT suite dependencies Signed-off-by: YANGDB <[email protected]> * add tests for ppl IT with - source = $testTable - source = $testTable | fields name, age - source = $testTable age=25 | fields name, age Signed-off-by: YANGDB <[email protected]> * update literal transformations according to catalyst's convention Signed-off-by: YANGDB <[email protected]> * separate unit-tests into a dedicated file per each test category Signed-off-by: YANGDB <[email protected]> * add IT tests for additional filters Signed-off-by: YANGDB <[email protected]> * mark unsatisfied tests as ignored until supporting code is ready Signed-off-by: YANGDB <[email protected]> * add README.md design and implementation details add AggregateFunction translation & tests remove unused DSL builder Signed-off-by: YANGDB <[email protected]> * remove docker related files Signed-off-by: YANGDB <[email protected]> * add text related unwrapping bug - fix add actual ppl based table content fetch and verification Signed-off-by: YANGDB <[email protected]> * add AggregatorTranslator support Signed-off-by: YANGDB <[email protected]> * resolve group by issues Signed-off-by: YANGDB <[email protected]> * add generic ppl extension chain which registers a chain of parsers Signed-off-by: YANGDB <[email protected]> * update some tests Signed-off-by: YANGDB <[email protected]> * add filter test with stats Signed-off-by: YANGDB <[email protected]> * add support for AND / OR Signed-off-by: YANGDB <[email protected]> * add additional unit tests support for AND / OR Signed-off-by: YANGDB <[email protected]> * add Max,Min,Count,Sum aggregation functions support Signed-off-by: YANGDB <[email protected]> * add basic span support for aggregate based queries Signed-off-by: YANGDB <[email protected]> * update supported PPL and roadmap for future support ppl commands... Signed-off-by: YANGDB <[email protected]> * update readme doc Signed-off-by: YANGDB <[email protected]> * add `head` support add README.md details for supported commands and planned future support Signed-off-by: YANGDB <[email protected]> * add support for sort command add missing license header update supported command in readme Signed-off-by: YANGDB <[email protected]> * update supported command in readme Signed-off-by: YANGDB <[email protected]> * update according to PR comments & review Signed-off-by: YANGDB <[email protected]> * update span & alias group by tests and composition Signed-off-by: YANGDB <[email protected]> * update scalastyle Signed-off-by: YANGDB <[email protected]> * update scalastyle Signed-off-by: YANGDB <[email protected]> * update scalastyle Signed-off-by: YANGDB <[email protected]> * update scalastyle Signed-off-by: YANGDB <[email protected]> * continue update according to PR comments Signed-off-by: YANGDB <[email protected]> * continue update according to PR comments Signed-off-by: YANGDB <[email protected]> * continue update according to PR comments Signed-off-by: YANGDB <[email protected]> * adding window function support for time based spans Signed-off-by: YANGDB <[email protected]> * adding window function test updating the PPL to Spark README.md Signed-off-by: YANGDB <[email protected]> * scalastyle updates Signed-off-by: YANGDB <[email protected]> * update abt build and README.md Signed-off-by: YANGDB <[email protected]> * update ppl CatalystPlan visitor to produce the logical plan as part of the visitor instead of String Signed-off-by: YANGDB <[email protected]> * update ppl tests & IT tests Signed-off-by: YANGDB <[email protected]> * update scala style Signed-off-by: YANGDB <[email protected]> * update scala style Signed-off-by: YANGDB <[email protected]> * minor refactory & package movement Signed-off-by: YANGDB <[email protected]> * additional refactory update the limit / sort visitor functions Signed-off-by: YANGDB <[email protected]> * update scala style formattings Signed-off-by: YANGDB <[email protected]> --------- Signed-off-by: YANGDB <[email protected]>
1 parent d32879b commit 7e6a19f

File tree

93 files changed

+10021
-7
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

93 files changed

+10021
-7
lines changed

DEVELOPER_GUIDE.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ sbt scalafmtAll
2222
```
2323
The code style is automatically checked, but users can also manually check it.
2424
```
25-
sbt sbt scalastyle
25+
sbt scalastyle
2626
```
2727
For IntelliJ user, read more in [scalafmt IntelliJ](https://scalameta.org/scalafmt/docs/installation.html#intellij) to integrate
2828
scalafmt with IntelliJ

README.md

+23-1
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,12 @@ OpenSearch Flint is ... It consists of two modules:
44

55
- `flint-core`: a module that contains Flint specification and client.
66
- `flint-spark-integration`: a module that provides Spark integration for Flint and derived dataset based on it.
7+
- `ppl-spark-integration`: a module that provides PPL query execution on top of Spark See [PPL repository](https://github.com/opensearch-project/piped-processing-language).
78

89
## Documentation
910

1011
Please refer to the [Flint Index Reference Manual](./docs/index.md) for more information.
12+
For PPL language see [PPL Reference Manual](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/index.rst) for more information.
1113

1214
## Prerequisites
1315

@@ -17,14 +19,22 @@ Version compatibility:
1719
|---------------|-------------|---------------|---------------|------------|
1820
| 0.1.0 | 11+ | 3.3.1 | 2.12.14 | 2.6+ |
1921

20-
## Usage
22+
## Flint Extension Usage
2123

2224
To use this application, you can run Spark with Flint extension:
2325

2426
```
2527
spark-sql --conf "spark.sql.extensions=org.opensearch.flint.FlintSparkExtensions"
2628
```
2729

30+
## PPL Extension Usage
31+
32+
To use PPL to Spark translation, you can run Spark with PPL extension:
33+
34+
```
35+
spark-sql --conf "spark.sql.extensions=org.opensearch.flint.FlintPPLSparkExtensions"
36+
```
37+
2838
## Build
2939

3040
To build and run this application with Spark, you can run:
@@ -37,6 +47,18 @@ then add org.opensearch:opensearch-spark_2.12 when run spark application, for ex
3747
bin/spark-shell --packages "org.opensearch:opensearch-spark_2.12:0.1.0-SNAPSHOT"
3848
```
3949

50+
### PPL Build & Run
51+
52+
To build and run this PPL in Spark, you can run:
53+
54+
```
55+
sbt clean sparkPPLCosmetic/publishM2
56+
```
57+
then add org.opensearch:opensearch-spark_2.12 when run spark application, for example,
58+
```
59+
bin/spark-shell --packages "org.opensearch:opensearch-spark-ppl_2.12:0.1.0-SNAPSHOT"
60+
```
61+
4062
## Code of Conduct
4163

4264
This project has adopted an [Open Source Code of Conduct](./CODE_OF_CONDUCT.md).

build.sbt

+47-3
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ lazy val commonSettings = Seq(
4343
Test / test := ((Test / test) dependsOn testScalastyle).value)
4444

4545
lazy val root = (project in file("."))
46-
.aggregate(flintCore, flintSparkIntegration, sparkSqlApplication)
46+
.aggregate(flintCore, flintSparkIntegration, pplSparkIntegration, sparkSqlApplication)
4747
.disablePlugins(AssemblyPlugin)
4848
.settings(name := "flint", publish / skip := true)
4949

@@ -61,6 +61,42 @@ lazy val flintCore = (project in file("flint-core"))
6161
exclude ("com.fasterxml.jackson.core", "jackson-databind")),
6262
publish / skip := true)
6363

64+
lazy val pplSparkIntegration = (project in file("ppl-spark-integration"))
65+
.enablePlugins(AssemblyPlugin, Antlr4Plugin)
66+
.settings(
67+
commonSettings,
68+
name := "ppl-spark-integration",
69+
scalaVersion := scala212,
70+
libraryDependencies ++= Seq(
71+
"org.scalactic" %% "scalactic" % "3.2.15" % "test",
72+
"org.scalatest" %% "scalatest" % "3.2.15" % "test",
73+
"org.scalatest" %% "scalatest-flatspec" % "3.2.15" % "test",
74+
"org.scalatestplus" %% "mockito-4-6" % "3.2.15.0" % "test",
75+
"com.stephenn" %% "scalatest-json-jsonassert" % "0.2.5" % "test",
76+
"com.github.sbt" % "junit-interface" % "0.13.3" % "test"),
77+
libraryDependencies ++= deps(sparkVersion),
78+
// ANTLR settings
79+
Antlr4 / antlr4Version := "4.8",
80+
Antlr4 / antlr4PackageName := Some("org.opensearch.flint.spark.ppl"),
81+
Antlr4 / antlr4GenListener := true,
82+
Antlr4 / antlr4GenVisitor := true,
83+
// Assembly settings
84+
assemblyPackageScala / assembleArtifact := false,
85+
assembly / assemblyOption ~= {
86+
_.withIncludeScala(false)
87+
},
88+
assembly / assemblyMergeStrategy := {
89+
case PathList(ps @ _*) if ps.last endsWith ("module-info.class") =>
90+
MergeStrategy.discard
91+
case PathList("module-info.class") => MergeStrategy.discard
92+
case PathList("META-INF", "versions", xs @ _, "module-info.class") =>
93+
MergeStrategy.discard
94+
case x =>
95+
val oldStrategy = (assembly / assemblyMergeStrategy).value
96+
oldStrategy(x)
97+
},
98+
assembly / test := (Test / test).value)
99+
64100
lazy val flintSparkIntegration = (project in file("flint-spark-integration"))
65101
.dependsOn(flintCore)
66102
.enablePlugins(AssemblyPlugin, Antlr4Plugin)
@@ -102,7 +138,7 @@ lazy val flintSparkIntegration = (project in file("flint-spark-integration"))
102138

103139
// Test assembly package with integration test.
104140
lazy val integtest = (project in file("integ-test"))
105-
.dependsOn(flintSparkIntegration % "test->test")
141+
.dependsOn(flintSparkIntegration % "test->test", pplSparkIntegration % "test->test" )
106142
.settings(
107143
commonSettings,
108144
name := "integ-test",
@@ -118,7 +154,7 @@ lazy val integtest = (project in file("integ-test"))
118154
"org.opensearch.client" % "opensearch-java" % "2.6.0" % "test"
119155
exclude ("com.fasterxml.jackson.core", "jackson-databind")),
120156
libraryDependencies ++= deps(sparkVersion),
121-
Test / fullClasspath += (flintSparkIntegration / assembly).value)
157+
Test / fullClasspath ++= Seq((flintSparkIntegration / assembly).value, (pplSparkIntegration / assembly).value))
122158

123159
lazy val standaloneCosmetic = project
124160
.settings(
@@ -144,6 +180,14 @@ lazy val sparkSqlApplicationCosmetic = project
144180
exportJars := true,
145181
Compile / packageBin := (sparkSqlApplication / assembly).value)
146182

183+
lazy val sparkPPLCosmetic = project
184+
.settings(
185+
name := "opensearch-spark-ppl",
186+
commonSettings,
187+
releaseSettings,
188+
exportJars := true,
189+
Compile / packageBin := (pplSparkIntegration / assembly).value)
190+
147191
lazy val releaseSettings = Seq(
148192
publishMavenStyle := true,
149193
publishArtifact := true,
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
/*
2+
* Copyright OpenSearch Contributors
3+
* SPDX-License-Identifier: Apache-2.0
4+
*/
5+
6+
package org.opensearch.flint.spark
7+
8+
import org.apache.spark.sql.catalyst.expressions.{Alias, ExprId}
9+
import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, LogicalPlan, Project}
10+
11+
/**
12+
* general utility functions for ppl to spark transformation test
13+
*/
14+
trait LogicalPlanTestUtils {
15+
16+
/**
17+
* utility function to compare two logical plans while ignoring the auto-generated expressionId
18+
* associated with the alias which is used for projection or aggregation
19+
* @param plan
20+
* @return
21+
*/
22+
def compareByString(plan: LogicalPlan): String = {
23+
// Create a rule to replace Alias's ExprId with a dummy id
24+
val rule: PartialFunction[LogicalPlan, LogicalPlan] = {
25+
case p: Project =>
26+
val newProjections = p.projectList.map {
27+
case alias: Alias =>
28+
Alias(alias.child, alias.name)(exprId = ExprId(0), qualifier = alias.qualifier)
29+
case other => other
30+
}
31+
p.copy(projectList = newProjections)
32+
33+
case agg: Aggregate =>
34+
val newGrouping = agg.groupingExpressions.map {
35+
case alias: Alias =>
36+
Alias(alias.child, alias.name)(exprId = ExprId(0), qualifier = alias.qualifier)
37+
case other => other
38+
}
39+
val newAggregations = agg.aggregateExpressions.map {
40+
case alias: Alias =>
41+
Alias(alias.child, alias.name)(exprId = ExprId(0), qualifier = alias.qualifier)
42+
case other => other
43+
}
44+
agg.copy(groupingExpressions = newGrouping, aggregateExpressions = newAggregations)
45+
46+
case other => other
47+
}
48+
49+
// Apply the rule using transform
50+
val transformedPlan = plan.transform(rule)
51+
52+
// Return the string representation of the transformed plan
53+
transformedPlan.toString
54+
}
55+
56+
}

0 commit comments

Comments
 (0)