-
Notifications
You must be signed in to change notification settings - Fork 7
Update scalardb-analytics-spark-sample to support ScalarDB Analytics 3.16 #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+361
−166
Merged
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
1c6f597
feat(scalardb-analytics-spark-sample): update sample to support Scala…
choplin 279f97f
feat(scalardb-analytics-spark-sample): enable automatic sample data l…
choplin 5f8790c
refactor(scalardb-analytics-spark-sample): standardize configuration …
choplin d161000
docs(scalardb-analytics-spark-sample): update README for automatic da…
choplin c7b101d
refactor(scalardb-analytics-sample): rename directory from scalardb-a…
choplin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,60 @@ | ||
| # ScalarDB Analytics Spark Sample | ||
|
|
||
feeblefakie marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ## Setup | ||
|
|
||
| ### 1. Start services | ||
|
|
||
| ```bash | ||
| docker compose up -d | ||
| ``` | ||
|
|
||
| ### 2. Load sample data | ||
|
|
||
| ```bash | ||
| docker compose run --rm sample-data-loader | ||
| ``` | ||
|
|
||
| ### 3. Create catalog | ||
|
|
||
| ```bash | ||
| docker compose run --rm scalardb-analytics-cli catalog create --catalog sample_catalog | ||
| ``` | ||
|
|
||
| ### 4. Register data sources | ||
|
|
||
| ```bash | ||
| # Register ScalarDB data source | ||
| docker compose run --rm scalardb-analytics-cli data-source register --data-source-json /config/data-sources/scalardb.json | ||
|
|
||
| # Register PostgreSQL data source | ||
| docker compose run --rm scalardb-analytics-cli data-source register --data-source-json /config/data-sources/postgres.json | ||
| ``` | ||
|
|
||
| ### 5. Run Spark SQL | ||
|
|
||
| ```bash | ||
| docker compose run --rm spark-sql | ||
| ``` | ||
|
|
||
| ## Query examples | ||
|
|
||
| ```sql | ||
| -- List catalogs | ||
| SHOW CATALOGS; | ||
|
|
||
| -- Use ScalarDB catalog | ||
| USE sample_catalog; | ||
|
|
||
| -- Query ScalarDB tables | ||
| SELECT * FROM scalardb.mysqlns.orders LIMIT 10; | ||
| SELECT * FROM scalardb.cassandrans.lineitem LIMIT 10; | ||
|
|
||
| -- Query PostgreSQL tables | ||
| SELECT * FROM postgres.public.customer LIMIT 10; | ||
| ``` | ||
|
|
||
| ## Stop services | ||
|
|
||
| ```bash | ||
| docker compose down | ||
| ``` | ||
4 changes: 4 additions & 0 deletions
4
scalardb-analytics-spark-sample/config/analytics-cli-config.properties
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| # ScalarDB Analytics CLI configuration | ||
| scalar.db.analytics.client.server.host=scalardb-analytics-server | ||
| scalar.db.analytics.client.server.catalog.port=11051 | ||
| scalar.db.analytics.client.server.metering.port=11052 |
32 changes: 32 additions & 0 deletions
32
scalardb-analytics-spark-sample/config/analytics-server.properties
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| # ScalarDB Analytics Server configuration | ||
|
|
||
| # Server ports | ||
| scalar.db.analytics.server.catalog.port=11051 | ||
| scalar.db.analytics.server.metering.port=11052 | ||
|
|
||
| # Server database configuration (for catalog metadata) | ||
| scalar.db.analytics.server.db.url=jdbc:postgresql://analytics-catalog-postgres:5432/catalogdb | ||
| scalar.db.analytics.server.db.username=analytics | ||
| scalar.db.analytics.server.db.password=analytics | ||
choplin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| # Server database connection pool configuration | ||
| scalar.db.analytics.server.db.pool.size=10 | ||
| scalar.db.analytics.server.db.pool.max-lifetime=1800000 | ||
| scalar.db.analytics.server.db.pool.connection-timeout=30000 | ||
| scalar.db.analytics.server.db.pool.minimum-idle=5 | ||
| scalar.db.analytics.server.db.pool.idle-timeout=600000 | ||
|
|
||
| # Metering storage configuration (filesystem for development) | ||
| scalar.db.analytics.server.metering.storage.provider=filesystem | ||
| scalar.db.analytics.server.metering.storage.path=/tmp/metering | ||
|
|
||
| # License configuration (required for production) | ||
| # scalar.db.analytics.server.licensing.license-key=<YOUR_LICENSE_KEY> | ||
| # scalar.db.analytics.server.licensing.license-check-cert-pem=<YOUR_LICENSE_CERT_PEM> | ||
|
|
||
| # Logging configuration | ||
| logging.level.root=INFO | ||
| logging.level.com.scalar.db.analytics=INFO | ||
|
|
||
| # Graceful shutdown configuration | ||
| scalar.db.analytics.server.graceful_shutdown_delay_millis=100 | ||
12 changes: 12 additions & 0 deletions
12
scalardb-analytics-spark-sample/config/data-sources/postgres.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| { | ||
| "catalog": "sample_catalog", | ||
| "name": "postgres", | ||
| "type": "postgres", | ||
| "provider": { | ||
| "host": "postgres", | ||
| "port": 5432, | ||
| "username": "postgres", | ||
| "password": "postgres", | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| "database": "sampledb" | ||
| } | ||
| } | ||
8 changes: 8 additions & 0 deletions
8
scalardb-analytics-spark-sample/config/data-sources/scalardb.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| { | ||
| "catalog": "sample_catalog", | ||
| "name": "scalardb", | ||
| "type": "scalardb", | ||
| "provider": { | ||
| "configPath": "/etc/scalardb.properties" | ||
| } | ||
| } |
File renamed without changes.
10 changes: 10 additions & 0 deletions
10
scalardb-analytics-spark-sample/config/spark-defaults.conf
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| spark.jars.packages com.scalar-labs:scalardb-analytics-spark-all-3.5_2.12:3.16.2 | ||
| spark.extraListeners com.scalar.db.analytics.spark.metering.ScalarDbAnalyticsListener | ||
|
|
||
| # Use the ScalarDB Analytics catalog as `sample_catalog` | ||
| spark.sql.catalog.sample_catalog com.scalar.db.analytics.spark.ScalarDbAnalyticsCatalog | ||
| spark.sql.catalog.sample_catalog.server.host scalardb-analytics-server | ||
| spark.sql.catalog.sample_catalog.server.catalog.port 11051 | ||
| spark.sql.catalog.sample_catalog.server.metering.port 11052 | ||
|
|
||
| spark.sql.defaultCatalog sample_catalog |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,20 +1,28 @@ | ||
| FROM eclipse-temurin:17-jre-jammy | ||
|
|
||
| ENV SPARK_VERSION=3.5.6 \ | ||
| HADOOP_VERSION=3 \ | ||
| SPARK_HOME=/opt/spark \ | ||
| PATH="/opt/spark/bin:/opt/spark/sbin:${PATH}" \ | ||
| SPARK_NO_DAEMONIZE=true | ||
|
|
||
| WORKDIR /work | ||
|
|
||
| ENV SPARK_VERSION 3.5.3 | ||
| WORKDIR /tmp | ||
|
|
||
| # Install dependencies | ||
| RUN apt-get update && \ | ||
| apt-get install -y --no-install-recommends \ | ||
| procps \ | ||
| curl && \ | ||
| curl \ | ||
| ca-certificates && \ | ||
| apt-get clean && \ | ||
| rm -rf /var/lib/apt/lists/* | ||
|
|
||
| SHELL ["/bin/bash", "-o", "pipefail", "-c"] | ||
| RUN curl -SL "https://dlcdn.apache.org/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop3.tgz" | tar -xzC /opt | ||
|
|
||
| RUN mv "/opt/spark-$SPARK_VERSION-bin-hadoop3" /opt/spark | ||
| # Download and verify Spark | ||
| RUN curl -fsSL -o spark.tgz "https://dlcdn.apache.org/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz" && \ | ||
| curl -fsSL -o spark.tgz.sha512 "https://dlcdn.apache.org/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz.sha512" && \ | ||
| sha512sum -c spark.tgz.sha512 && \ | ||
| tar -xzf spark.tgz -C /opt && \ | ||
| mv "/opt/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}" "${SPARK_HOME}" && \ | ||
| rm -rf spark.tgz spark.tgz.sha512 | ||
|
|
||
| WORKDIR /opt/spark |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.