ClickHouse · gingerwizard · Apr 16, 2025 · Apr 14, 2025 · Apr 14, 2025 · Apr 14, 2025
@@ -222,6 +222,15 @@ That way, you would be able to access clickhouse1 table `<ck_db>.<ck_table>` fro
 
 :::
 
+## ClickHouse Cloud Settings {#clickhouse-cloud-settings}
+
+When connecting to [ClickHouse Cloud](https://clickhouse.com), make sure to enable SSL and set the appropriate SSL mode. For example:
+
+```text
+spark.sql.catalog.clickhouse.option.ssl        true
+spark.sql.catalog.clickhouse.option.ssl_mode   NONE
+```
+
 ## Read Data {#read-data}
 
 <Tabs groupId="spark_apis">

@@ -0,0 +1,89 @@
+---
+sidebar_label: 'Azure Synapse'
+slug: /integrations/azure-synapse
+description: 'Introduction to Azure Synapse with ClickHouse'
+keywords: ['clickhouse', 'azure synapse', 'azure', 'synapse', 'microsoft', 'azure spark', 'data']
+title: 'Integrating Azure Synapse with ClickHouse'
+---
+
+import TOCInline from '@theme/TOCInline';
+import Image from '@theme/IdealImage';
+import sparkConfigViaNotebook from '@site/static/images/integrations/data-ingestion/azure-synapse/spark_notebook_conf.png';
+import sparkUICHSettings from '@site/static/images/integrations/data-ingestion/azure-synapse/spark_ui_ch_settings.png';
+
+# Integrating Azure Synapse with ClickHouse
+
+[Azure Synapse](https://azure.microsoft.com/en-us/products/synapse-analytics) is an integrated analytics service that combines big data, data science and warehousing to enable fast, large-scale data analysis.
+Within Synapse, Spark pools provide on-demand, scalable [Apache Spark](https://spark.apache.org) clusters that let users run complex data transformations, machine learning, and integrations with external systems.
+
+This article will show you how to integrate the [ClickHouse Spark connector](/integrations/apache-spark/spark-native-connector) when working with Apache Spark within Azure Synapse.
+
+
+<TOCInline toc={toc}></TOCInline>
+
+## Add the connector's dependencies {#add-connector-dependencies}
+Azure Synapse supports three levels of [packages maintenance](https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-libraries):
+1. Default packages
+2. Spark pool level
+3. Session level
+
+<br/>
+
+Follow the [Manage libraries for Apache Spark pools guide](https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-pool-packages) and add the following required dependencies to your Spark application
+   - `clickhouse-spark-runtime-{spark_version}_{scala_version}-{connector_version}.jar` - [official maven](https://mvnrepository.com/artifact/com.clickhouse.spark)
+   - `clickhouse-jdbc-{java_client_version}-all.jar` - [official maven](https://mvnrepository.com/artifact/com.clickhouse/clickhouse-jdbc)
+
+Please visit our [Spark Connector Compatibility Matrix](/integrations/apache-spark/spark-native-connector#compatibility-matrix) docs to understand which versions suit your needs.
+
+## Add ClickHouse as a catalog {#add-clickhouse-as-catalog}
+
+There are a variety of ways to add Spark configs to your session:
+* Custom configuration file to load with your session
+* Add configurations via Azure Synapse UI
+* Add configurations in your Synapse notebook
+
+Follow this [Manage Apache Spark configuration](https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-create-spark-configuration) 
+and add the [connector required Spark configurations](/integrations/apache-spark/spark-native-connector#register-the-catalog-required).
+
+For instance, you can configure your Spark session in your notebook with these settings:
+
+```python
+%%configure -f
+{
+    "conf": {
+        "spark.sql.catalog.clickhouse": "com.clickhouse.spark.ClickHouseCatalog",
+        "spark.sql.catalog.clickhouse.host": "<clickhouse host>",
+        "spark.sql.catalog.clickhouse.protocol": "https",
+        "spark.sql.catalog.clickhouse.http_port": "<port>",
+        "spark.sql.catalog.clickhouse.user": "<username>",
+        "spark.sql.catalog.clickhouse.password": "password",
+        "spark.sql.catalog.clickhouse.database": "default"
+    }
+}
+```
+
+Make sure it will be in the first cell as follows:
+
+<Image img={sparkConfigViaNotebook} size="xl" alt="Setting Spark configurations via notebook" border/>
+
+Please visit the [ClickHouse Spark configurations page](/integrations/apache-spark/spark-native-connector#configurations) for additional settings.
+
+:::info
+When working with ClickHouse Cloud Please make sure to set the [required Spark settings](/integrations/apache-spark/spark-native-connector#clickhouse-cloud-settings).  
+:::
+
+## Setup Verification {#setup-verification}
+
+To verify that the dependencies and configurations were set successfully, please visit your session's Spark UI, and go to your `Environment` tab.
+There, look for your ClickHouse related settings:
+
+<Image img={sparkUICHSettings} size="xl" alt="Verifying ClickHouse settings using Spark UI" border/>
+
+
+## Additional Resources {#additional-resources}
+
+- [ClickHouse Spark Connector Docs](/integrations/apache-spark)
+- [Azure Synapse Spark Pools Overview](https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-overview)
+- [Optimize performance for Apache Spark workloads](https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-performance)
+- [Manage libraries for Apache Spark pools in Synapse](https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-pool-packages)
+- [Manage Apache Spark configuration in Synapse](https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-create-spark-configuration)
@@ -1,6 +1,6 @@
 ---
 slug: /integrations/data-ingestion-overview
-keywords: ['Airbyte', 'Amazon Glue', 'Apache Beam', 'dbt', 'Fivetran', 'NiFi', 'dlt', 'Vector']
+keywords: [ 'Airbyte', 'Apache Spark', 'Spark', 'Azure Synapse', 'Amazon Glue', 'Apache Beam', 'dbt', 'Fivetran', 'NiFi', 'dlt', 'Vector' ]
 title: 'Data Ingestion'
 description: 'Landing page for the data ingestion section'
 ---
@@ -10,13 +10,16 @@ description: 'Landing page for the data ingestion section'
 ClickHouse integrates with a number of solutions for data integration and transformation.
 For more information check out the pages below:
 
-| Data Ingestion Tool                              | Description                                                                                                                                                                                                                       |
-|--------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| [Airbyte](/integrations/airbyte)         | An open-source data integration platform. It allows the creation of ELT data pipelines and is shipped with more than 140 out-of-the-box connectors.                                                                               |
-| [Amazon Glue](/integrations/glue)        | A fully managed, serverless data integration service provided by Amazon Web Services (AWS) simplifying the process of discovering, preparing, and transforming data for analytics, machine learning, and application development. |
-| [Apache Beam](/integrations/apache-beam) | An open-source, unified programming model that enables developers to define and execute both batch and stream (continuous) data processing pipelines.                                                                             |
-| [dbt](/integrations/dbt)                 | Enables analytics engineers to transform data in their warehouses by simply writing select statements.                                                                                                                            |
-| [dlt](/integrations/data-ingestion/etl-tools/dlt-and-clickhouse)                 | An open-source library that you can add to your Python scripts to load data from various and often messy data sources into well-structured, live datasets.                                                                        |
-| [Fivetran](/integrations/fivetran)       | An automated data movement platform moving data out of, into and across your cloud data platforms.                                                                                                                                |
-| [NiFi](/integrations/nifi)               | An open-source workflow management software designed to automate data flow between software systems.                                                                                                                              |
-| [Vector](/integrations/vector)           | A high-performance observability data pipeline that puts organizations in control of their observability data.                                                                                                                    |
+| Data Ingestion Tool                                              | Description                                                                                                                                                                                                                           |
+|------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [Airbyte](/integrations/airbyte)                                 | An open-source data integration platform. It allows the creation of ELT data pipelines and is shipped with more than 140 out-of-the-box connectors.                                                                                   |
+| [Apache Spark](/integrations/apache-spark)                       | A multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters                                                                                                        |
+| [Amazon Glue](/integrations/glue)                                | A fully managed, serverless data integration service provided by Amazon Web Services (AWS) simplifying the process of discovering, preparing, and transforming data for analytics, machine learning, and application development.     |
+| [Azure Synapse](/integrations/azure-synapse)                     | A fully managed, cloud-based analytics service provided by Microsoft Azure, combining big data and data warehousing to simplify data integration, transformation, and analytics at scale using SQL, Apache Spark, and data pipelines. |
+| [Apache Beam](/integrations/apache-beam)                         | An open-source, unified programming model that enables developers to define and execute both batch and stream (continuous) data processing pipelines.                                                                                 |
+| [dbt](/integrations/dbt)                                         | Enables analytics engineers to transform data in their warehouses by simply writing select statements.                                                                                                                                |
+| [dlt](/integrations/data-ingestion/etl-tools/dlt-and-clickhouse) | An open-source library that you can add to your Python scripts to load data from various and often messy data sources into well-structured, live datasets.                                                                            |
+| [Fivetran](/integrations/fivetran)                               | An automated data movement platform moving data out of, into and across your cloud data platforms.                                                                                                                                    |
+| [NiFi](/integrations/nifi)                                       | An open-source workflow management software designed to automate data flow between software systems.                                                                                                                                  |
+| [Vector](/integrations/vector)                                   | A high-performance observability data pipeline that puts organizations in control of their observability data.                                                                                                                        |
+
@@ -1,6 +1,6 @@
 ---
 slug: /integrations/index
-keywords: ['AWS S3', 'PostgreSQL', 'Kafka', 'Apache Spark', 'MySQL', 'Cassandra', 'Redis', 'RabbitMQ', 'MongoDB', 'Google Cloud Storage', 'Hive', 'Hudi', 'Iceberg', 'MinIO', 'Delta Lake', 'RocksDB', 'Splunk', 'SQLite', 'NATS', 'EMQX', 'local files', 'JDBC', 'ODBC']
+keywords: ['AWS S3', 'PostgreSQL', 'Kafka', 'MySQL', 'Cassandra', 'Redis', 'RabbitMQ', 'MongoDB', 'Google Cloud Storage', 'Hive', 'Hudi', 'Iceberg', 'MinIO', 'Delta Lake', 'RocksDB', 'Splunk', 'SQLite', 'NATS', 'EMQX', 'local files', 'JDBC', 'ODBC']
 description: 'Datasources overview page'
 title: 'Data Sources'
 ---
@@ -15,7 +15,6 @@ For further information see the pages listed below:
 | [AWS S3](/integrations/s3)                                            |
 | [PostgreSQL](/integrations/postgresql)                                |
 | [Kafka](/integrations/kafka)                                          |
-| [Apache Spark](/integrations/apache-spark)                            |
 | [MySQL](/integrations/mysql)                                          |
 | [Cassandra](/integrations/cassandra)                                  |
 | [Redis](/integrations/redis)                                          |

@@ -81,6 +81,7 @@ import Yepcodesvg from '@site/static/images/integrations/logos/yepcode.svg';
 import Warpstreamsvg from '@site/static/images/integrations/logos/warpstream.svg';
 import Bytewaxsvg from '@site/static/images/integrations/logos/bytewax.svg';
 import glue_logo from '@site/static/images/integrations/logos/glue_logo.png';
+import azure_synapse_logo from '@site/static/images/integrations/logos/azure-synapse.png';
 import logo_cpp from '@site/static/images/integrations/logos/logo_cpp.png';
 import cassandra from '@site/static/images/integrations/logos/cassandra.png';
 import deltalake from '@site/static/images/integrations/logos/deltalake.png';
@@ -204,6 +205,7 @@ We are actively compiling this list of ClickHouse integrations below, so it's no
 |Amazon Glue|<Image img={glue_logo} size="logo" alt="Amazon Glue logo"/>|Data ingestion|Query ClickHouse over JDBC|[Documentation](/integrations/glue)|
 |Apache Spark|<Sparksvg alt="Amazon Spark logo" style={{width: '3rem'}}/>|Data ingestion|Spark ClickHouse Connector is a high performance connector built on top of Spark DataSource V2.|[GitHub](https://github.com/housepower/spark-clickhouse-connector),<br/>[Documentation](/integrations/data-ingestion/apache-spark/index.md)|
 |Azure Event Hubs|<Azureeventhubssvg alt="Azure Events Hub logo" style={{width: '3rem'}}/>|Data ingestion|A data streaming platform that supports Apache Kafka's native protocol|[Website](https://azure.microsoft.com/en-gb/products/event-hubs)|
+|Azure Synapse|<Image img={azure_synapse_logo} size="logo" alt="Azure Synapse logo"/>|Data ingestion|A cloud-based analytics service for big data and data warehousing.|[Documentation](/integrations/azure-synapse)|
 |C++|<Image img={logo_cpp} alt="Cpp logo" size="logo"/>|Language client|C++ client for ClickHouse|[GitHub](https://github.com/ClickHouse/clickhouse-cpp)|
 |Cassandra|<Image img={cassandra} alt="Cassandra logo" size="logo"/>|Data ingestion|Allows ClickHouse to use [Cassandra](https://cassandra.apache.org/) as a dictionary source.|[Documentation](/sql-reference/dictionaries/index.md#cassandra)|
 |CHDB|<Chdbsvg alt="CHDB logo" style={{width: '3rem' }}/>|AI/ML|An embedded OLAP SQL Engine|[GitHub](https://github.com/chdb-io/chdb#/),<br/>[Documentation](https://doc.chdb.io/)|

@@ -981,3 +981,7 @@ tunable
 DAGs
 --docs/migrations/postgres/appendix.md--
 Citus
+--docs/integrations/data-ingestion/azure-synapse/index.md--
+microsoft
+sparkConfigViaNotebook
+sparkUICHSettings
@@ -793,18 +793,6 @@ const sidebars = {
             "integrations/data-ingestion/kafka/kafka-table-engine-named-collections"
           ],
         },
-        {
-          type: "category",
-          label: "Apache Spark",
-          className: "top-nav-item",
-          collapsed: true,
-          collapsible: true,
-          items: [
-            "integrations/data-ingestion/apache-spark/index",
-            "integrations/data-ingestion/apache-spark/spark-native-connector",
-            "integrations/data-ingestion/apache-spark/spark-jdbc",
-          ],
-        },
         "integrations/data-sources/mysql",
         "integrations/data-sources/cassandra",
         "integrations/data-sources/redis",
@@ -935,7 +923,20 @@ const sidebars = {
       link: { type: "doc", id: "integrations/data-ingestion/data-ingestion-index" },
       items: [
         "integrations/data-ingestion/etl-tools/airbyte-and-clickhouse",
+        {
+          type: "category",
+          label: "Apache Spark",
+          className: "top-nav-item",
+          collapsed: true,
+          collapsible: true,
+          items: [
+            "integrations/data-ingestion/apache-spark/index",
+            "integrations/data-ingestion/apache-spark/spark-native-connector",
+            "integrations/data-ingestion/apache-spark/spark-jdbc",
+          ],
+        },
         "integrations/data-ingestion/aws-glue/index",
+        "integrations/data-ingestion/azure-synapse/index",
         "integrations/data-ingestion/etl-tools/apache-beam",
         {
           type: "category",