-
Notifications
You must be signed in to change notification settings - Fork 7
Update scalardb-analytics-spark-sample to support ScalarDB Analytics 3.16 #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @choplin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly upgrades the scalardb-analytics-spark-sample to fully support ScalarDB Analytics 3.16. The core intent is to provide a comprehensive, self-contained environment for users to explore federated queries, combining data from ScalarDB-managed stores (Cassandra, MySQL) and direct PostgreSQL tables through a unified Spark SQL interface. The changes streamline the setup, improve security, and enhance the overall user experience.
Highlights
- Full ScalarDB Analytics 3.16 Integration: The sample now includes the ScalarDB Analytics Server and CLI as Docker services, enabling a complete federated query setup with version 3.16.2.
- Enhanced Federated Query Experience: Users can now seamlessly query data from both ScalarDB-managed databases (Cassandra, MySQL) and direct PostgreSQL tables through a unified Spark SQL interface, demonstrated with new data source configurations.
- Improved Sample Structure and Setup: Configuration files have been reorganized into a dedicated
configdirectory, and a comprehensiveREADME.mdhas been added with step-by-step instructions for quick setup and query execution. - Docker Build and Dependency Management Improvements: The Spark Dockerfile now includes SHA512 verification for security, and Spark JAR dependencies are cached using Docker volumes to speed up startup times. The sample data loader's build process was also modernized using the Gradle application plugin.
- Platform Compatibility: Explicit platform specification (
linux/amd64) has been added for the Analytics Server to ensure compatibility, particularly for users on ARM64 architectures like M1/M2 Macs.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the scalardb-analytics-spark-sample to support ScalarDB Analytics 3.16, introducing a more complete setup with federated queries. The changes are well-structured, including the reorganization of configuration files and improvements to the Docker setup like SHA512 verification for Spark.
My review focuses on improving security and stability. Key suggestions include:
- Using stable versions for database images (
postgres:16,mysql:8) instead of pre-release versions. - Addressing hardcoded credentials in configuration files.
- Tuning healthcheck parameters to be less aggressive.
| "username": "postgres", | ||
| "password": "postgres", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scalardb-analytics-spark-sample/config/analytics-cli-config.properties
Outdated
Show resolved
Hide resolved
48659ed to
b5ffbf1
Compare
…rDB Analytics 3.16 - Add ScalarDB Analytics Server and CLI services to docker-compose - Configure Analytics Server with catalog database - Create data source definitions for ScalarDB and PostgreSQL - Reorganize configuration files into config directory - Add platform specification for ARM64 compatibility - Update Dockerfiles to use Gradle application plugin instead of shadowJar - Add volume caching for Spark JAR dependencies - Include setup instructions in README This update enables the sample to work with ScalarDB Analytics 3.16, providing a complete example of federated queries across ScalarDB and PostgreSQL data sources.
b5ffbf1 to
1c6f597
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
…oading - Remove profile requirement for sample-data-loader service - Add volume-based flag to track data loading completion - Implement idempotent data loading with skip on subsequent runs - Add healthcheck dependencies for database readiness
…file naming - Rename analytics-server.properties to scalardb-analytics-server.properties - Rename analytics-cli-config.properties to scalardb-analytics-cli.properties - Replace hyphens with underscores in property names for consistency - Update docker-compose.yml volume mappings
…ta loading - Remove manual sample-data-loader step - Add --wait flag to docker compose up command - Update step numbering - Add note about automatic data loading on first run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contents look good.
I left suggestions on the naming. PTAL!
…nalytics-spark-sample - Rename directory from 'scalardb-analytics-spark-sample' to 'scalardb-analytics-sample' - Update README.md title to match new directory name - This change reflects that the sample is not limited to Spark but covers ScalarDB Analytics in general
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
Description
This PR updates the scalardb-analytics-spark-sample to support ScalarDB Analytics 3.16. The sample now includes a complete setup with ScalarDB Analytics Server and CLI, enabling users to run federated queries across ScalarDB and PostgreSQL data sources.
Related issues and/or PRs
N/A
Changes made
configdirectory for better organizationChecklist
Additional notes (optional)
None