Update scalardb-analytics-spark-sample to support ScalarDB Analytics 3.16 #81

choplin · 2025-07-27T14:10:02Z

Description

This PR updates the scalardb-analytics-spark-sample to support ScalarDB Analytics 3.16. The sample now includes a complete setup with ScalarDB Analytics Server and CLI, enabling users to run federated queries across ScalarDB and PostgreSQL data sources.

Related issues and/or PRs

N/A

Changes made

Added ScalarDB Analytics Server (3.16.2) and CLI services to docker-compose.yml
Created a dedicated PostgreSQL database for Analytics Server catalog metadata
Added configuration files for Analytics Server and CLI
Created data source definitions for ScalarDB and PostgreSQL
Reorganized all configuration files into a config directory for better organization
Updated Dockerfiles:
- Spark Dockerfile: Added SHA512 verification for security, removed unnecessary Coursier
- Sample Data Loader: Migrated from shadowJar to Gradle application plugin
Added platform specification for ARM64 compatibility (required for Analytics Server on M1/M2 Macs)
Added volume caching for Spark JAR dependencies to improve startup time
Created README with step-by-step setup instructions

Checklist

I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes.
Any remaining open issues linked to this PR are documented and up-to-date (Jira, GitHub, etc.).
Tests (unit, integration, etc.) have been added for the changes.
My changes generate no new warnings.
Any dependent changes in other PRs have been merged and published.

Additional notes (optional)

None

gemini-code-assist

Summary of Changes

Hello @choplin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the scalardb-analytics-spark-sample to fully support ScalarDB Analytics 3.16. The core intent is to provide a comprehensive, self-contained environment for users to explore federated queries, combining data from ScalarDB-managed stores (Cassandra, MySQL) and direct PostgreSQL tables through a unified Spark SQL interface. The changes streamline the setup, improve security, and enhance the overall user experience.

Highlights

Full ScalarDB Analytics 3.16 Integration: The sample now includes the ScalarDB Analytics Server and CLI as Docker services, enabling a complete federated query setup with version 3.16.2.
Enhanced Federated Query Experience: Users can now seamlessly query data from both ScalarDB-managed databases (Cassandra, MySQL) and direct PostgreSQL tables through a unified Spark SQL interface, demonstrated with new data source configurations.
Improved Sample Structure and Setup: Configuration files have been reorganized into a dedicated config directory, and a comprehensive README.md has been added with step-by-step instructions for quick setup and query execution.
Docker Build and Dependency Management Improvements: The Spark Dockerfile now includes SHA512 verification for security, and Spark JAR dependencies are cached using Docker volumes to speed up startup times. The sample data loader's build process was also modernized using the Gradle application plugin.
Platform Compatibility: Explicit platform specification (linux/amd64) has been added for the Analytics Server to ensure compatibility, particularly for users on ARM64 architectures like M1/M2 Macs.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request updates the scalardb-analytics-spark-sample to support ScalarDB Analytics 3.16, introducing a more complete setup with federated queries. The changes are well-structured, including the reorganization of configuration files and improvements to the Docker setup like SHA512 verification for Spark.

My review focuses on improving security and stability. Key suggestions include:

Using stable versions for database images (postgres:16, mysql:8) instead of pre-release versions.
Addressing hardcoded credentials in configuration files.
Tuning healthcheck parameters to be less aggressive.

scalardb-analytics-spark-sample/config/analytics-server.properties

gemini-code-assist · 2025-07-27T14:12:34Z

scalardb-analytics-spark-sample/config/data-sources/postgres.json

+    "username": "postgres",
+    "password": "postgres",


Hardcoding credentials in configuration files is a security risk. It's better to use a mechanism to inject these at runtime, for example, through environment variables. This is a good practice to follow even in sample applications to avoid accidentally leaking credentials.

scalardb-analytics-spark-sample/docker-compose.yml

scalardb-analytics-spark-sample/config/analytics-cli-config.properties

scalardb-analytics-spark-sample/docker-compose.yml

…rDB Analytics 3.16 - Add ScalarDB Analytics Server and CLI services to docker-compose - Configure Analytics Server with catalog database - Create data source definitions for ScalarDB and PostgreSQL - Reorganize configuration files into config directory - Add platform specification for ARM64 compatibility - Update Dockerfiles to use Gradle application plugin instead of shadowJar - Add volume caching for Spark JAR dependencies - Include setup instructions in README This update enables the sample to work with ScalarDB Analytics 3.16, providing a complete example of federated queries across ScalarDB and PostgreSQL data sources.

komamitsu

LGTM! 👍

…oading - Remove profile requirement for sample-data-loader service - Add volume-based flag to track data loading completion - Implement idempotent data loading with skip on subsequent runs - Add healthcheck dependencies for database readiness

…file naming - Rename analytics-server.properties to scalardb-analytics-server.properties - Rename analytics-cli-config.properties to scalardb-analytics-cli.properties - Replace hyphens with underscores in property names for consistency - Update docker-compose.yml volume mappings

…ta loading - Remove manual sample-data-loader step - Add --wait flag to docker compose up command - Update step numbering - Add note about automatic data loading on first run

brfrn169

LGTM! Thank you!

feeblefakie

The contents look good.
I left suggestions on the naming. PTAL!

scalardb-analytics-spark-sample/README.md

…nalytics-spark-sample - Rename directory from 'scalardb-analytics-spark-sample' to 'scalardb-analytics-sample' - Update README.md title to match new directory name - This change reflects that the sample is not limited to Spark but covers ScalarDB Analytics in general

feeblefakie

LGTM! Thank you!

gemini-code-assist bot reviewed Jul 27, 2025

View reviewed changes

choplin force-pushed the update-analytics-sample-for-3.16 branch from 48659ed to b5ffbf1 Compare July 27, 2025 14:22

choplin force-pushed the update-analytics-sample-for-3.16 branch from b5ffbf1 to 1c6f597 Compare July 27, 2025 14:24

choplin requested review from brfrn169, feeblefakie and komamitsu July 27, 2025 14:25

komamitsu approved these changes Jul 28, 2025

View reviewed changes

choplin added 3 commits July 28, 2025 23:35

docs(scalardb-analytics-spark-sample): update README for automatic da…

d161000

…ta loading - Remove manual sample-data-loader step - Add --wait flag to docker compose up command - Update step numbering - Add note about automatic data loading on first run

brfrn169 approved these changes Jul 28, 2025

View reviewed changes

feeblefakie assigned choplin Jul 29, 2025

feeblefakie reviewed Jul 29, 2025

View reviewed changes

scalardb-analytics-spark-sample/README.md Outdated Show resolved Hide resolved

scalardb-analytics-spark-sample/README.md Show resolved Hide resolved

choplin requested a review from feeblefakie July 30, 2025 15:36

feeblefakie approved these changes Jul 31, 2025

View reviewed changes

feeblefakie merged commit 7c29f41 into main Jul 31, 2025
32 checks passed

feeblefakie deleted the update-analytics-sample-for-3.16 branch July 31, 2025 01:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update scalardb-analytics-spark-sample to support ScalarDB Analytics 3.16 #81

Update scalardb-analytics-spark-sample to support ScalarDB Analytics 3.16 #81

Uh oh!

choplin commented Jul 27, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Jul 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

komamitsu left a comment

Uh oh!

brfrn169 left a comment

Uh oh!

feeblefakie left a comment

Uh oh!

Uh oh!

Uh oh!

feeblefakie left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Update scalardb-analytics-spark-sample to support ScalarDB Analytics 3.16 #81

Update scalardb-analytics-spark-sample to support ScalarDB Analytics 3.16 #81

Uh oh!

Conversation

choplin commented Jul 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues and/or PRs

Changes made

Checklist

Additional notes (optional)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Jul 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

komamitsu left a comment

Choose a reason for hiding this comment

Uh oh!

brfrn169 left a comment

Choose a reason for hiding this comment

Uh oh!

feeblefakie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

feeblefakie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

choplin commented Jul 27, 2025 •

edited

Loading