-
Notifications
You must be signed in to change notification settings - Fork 323
Move Spark docs + add Azure Synapse documentation #3675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
5b5b648
move spark from data sources to data ingestion
BentsiLeviav 00c921a
Add spark azure synapse docs
BentsiLeviav 0860a81
Update index.md
Blargian 1787dee
Add explicit header anchors
Blargian ac84d50
Update aspell-dict-file.txt
Blargian 9b52e4e
Add azure synapse to integration list
BentsiLeviav 02232ab
Merge remote-tracking branch 'origin/move-spark-and-add-syanpse' into…
BentsiLeviav 06afd05
Merge branch 'main' into move-spark-and-add-syanpse
BentsiLeviav a15de7d
Add spark settings for cloud usage
BentsiLeviav 62a0617
Merge remote-tracking branch 'origin/move-spark-and-add-syanpse' into…
BentsiLeviav 365dfbb
Add cloud usage info
BentsiLeviav 7ed113a
CR changes
BentsiLeviav File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
--- | ||
sidebar_label: 'Azure Synapse' | ||
slug: /integrations/azure-synapse | ||
description: 'Introduction to Azure Synapse with ClickHouse' | ||
keywords: ['clickhouse', 'azure synapse', 'azure', 'synapse', 'microsoft', 'azure spark', 'data'] | ||
title: 'Integrating Azure Synapse with ClickHouse' | ||
--- | ||
|
||
import TOCInline from '@theme/TOCInline'; | ||
import Image from '@theme/IdealImage'; | ||
import sparkConfigViaNotebook from '@site/static/images/integrations/data-ingestion/azure-synapse/spark_notebook_conf.png'; | ||
import sparkUICHSettings from '@site/static/images/integrations/data-ingestion/azure-synapse/spark_ui_ch_settings.png'; | ||
|
||
# Integrating Azure Synapse with ClickHouse | ||
|
||
[Azure Synapse](https://azure.microsoft.com/en-us/products/synapse-analytics) is an integrated analytics service that combines big data, data science and warehousing to enable fast, large-scale data analysis. | ||
Within Synapse, Spark pools provide on-demand, scalable [Apache Spark](https://spark.apache.org) clusters that let users run complex data transformations, machine learning, and integrations with external systems. | ||
|
||
This article will show you how to integrate the [ClickHouse Spark connector](/integrations/apache-spark/spark-native-connector) when working with Apache Spark within Azure Synapse. | ||
|
||
|
||
<TOCInline toc={toc}></TOCInline> | ||
|
||
## Add the connector's dependencies {#add-connector-dependencies} | ||
Azure Synapse supports three levels of [packages maintenance](https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-libraries): | ||
1. Default packages | ||
2. Spark pool level | ||
3. Session level | ||
|
||
<br/> | ||
|
||
Follow the [Manage libraries for Apache Spark pools guide](https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-pool-packages) and add the following required dependencies to your Spark application | ||
- `clickhouse-spark-runtime-{spark_version}_{scala_version}-{connector_version}.jar` - [official maven](https://mvnrepository.com/artifact/com.clickhouse.spark) | ||
- `clickhouse-jdbc-{java_client_version}-all.jar` - [official maven](https://mvnrepository.com/artifact/com.clickhouse/clickhouse-jdbc) | ||
|
||
Please visit our [Spark Connector Compatibility Matrix](/integrations/apache-spark/spark-native-connector#compatibility-matrix) docs to understand which versions suit your needs. | ||
|
||
## Add ClickHouse as a catalog {#add-clickhouse-as-catalog} | ||
|
||
There are a variety of ways to add Spark configs to your session: | ||
* Custom configuration file to load with your session | ||
* Add configurations via Azure Synapse UI | ||
* Add configurations in your Synapse notebook | ||
|
||
Follow this [Manage Apache Spark configuration](https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-create-spark-configuration) | ||
and add the [connector required Spark configurations](/integrations/apache-spark/spark-native-connector#register-the-catalog-required). | ||
|
||
For instance, you can configure your Spark session in your notebook with these settings: | ||
|
||
```python | ||
%%configure -f | ||
{ | ||
"conf": { | ||
"spark.sql.catalog.clickhouse": "com.clickhouse.spark.ClickHouseCatalog", | ||
"spark.sql.catalog.clickhouse.host": "<clickhouse host>", | ||
"spark.sql.catalog.clickhouse.protocol": "https", | ||
"spark.sql.catalog.clickhouse.http_port": "<port>", | ||
"spark.sql.catalog.clickhouse.user": "<username>", | ||
"spark.sql.catalog.clickhouse.password": "password", | ||
"spark.sql.catalog.clickhouse.database": "default" | ||
} | ||
} | ||
``` | ||
|
||
Make sure it will be in the first cell as follows: | ||
|
||
<Image img={sparkConfigViaNotebook} size="xl" alt="Setting Spark configurations via notebook" border/> | ||
mshustov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Please visit the [ClickHouse Spark configurations page](/integrations/apache-spark/spark-native-connector#configurations) for additional settings. | ||
|
||
:::info | ||
When working with ClickHouse Cloud Please make sure to set the [required Spark settings](/integrations/apache-spark/spark-native-connector#clickhouse-cloud-settings). | ||
::: | ||
|
||
## Setup Verification {#setup-verification} | ||
|
||
To verify that the dependencies and configurations were set successfully, please visit your session's Spark UI, and go to your `Environment` tab. | ||
There, look for your ClickHouse related settings: | ||
|
||
<Image img={sparkUICHSettings} size="xl" alt="Verifying ClickHouse settings using Spark UI" border/> | ||
|
||
|
||
## Additional Resources {#additional-resources} | ||
|
||
- [ClickHouse Spark Connector Docs](/integrations/apache-spark) | ||
- [Azure Synapse Spark Pools Overview](https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-overview) | ||
- [Optimize performance for Apache Spark workloads](https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-performance) | ||
- [Manage libraries for Apache Spark pools in Synapse](https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-pool-packages) | ||
- [Manage Apache Spark configuration in Synapse](https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-create-spark-configuration) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file added
BIN
+55.2 KB
static/images/integrations/data-ingestion/azure-synapse/spark_notebook_conf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+35.5 KB
static/images/integrations/data-ingestion/azure-synapse/spark_ui_ch_settings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to the docs, just a note: JDBC version in Spark connector is quite outdated 😞