sidebar_label

slug

description

keywords

title

Azure Synapse

/integrations/azure-synapse

Introduction to Azure Synapse with ClickHouse

clickhouse

azure synapse

azure

synapse

microsoft

azure spark

data

Integrating Azure Synapse with ClickHouse

import TOCInline from '@theme/TOCInline'; import Image from '@theme/IdealImage'; import sparkConfigViaNotebook from '@site/static/images/integrations/data-ingestion/azure-synapse/spark_notebook_conf.png'; import sparkUICHSettings from '@site/static/images/integrations/data-ingestion/azure-synapse/spark_ui_ch_settings.png';

Integrating Azure Synapse with ClickHouse

Azure Synapse is an integrated analytics service that combines big data, data science and warehousing to enable fast, large-scale data analysis. Within Synapse, Spark pools provide on-demand, scalable Apache Spark clusters that let users run complex data transformations, machine learning, and integrations with external systems.

This article will show you how to integrate the ClickHouse Spark connector when working with Apache Spark within Azure Synapse.

Add the connector's dependencies {#add-connector-dependencies}

Azure Synapse supports three levels of packages maintenance:

Default packages
Spark pool level
Session level

Follow the Manage libraries for Apache Spark pools guide and add the following required dependencies to your Spark application

clickhouse-spark-runtime-{spark_version}_{scala_version}-{connector_version}.jar - official maven
clickhouse-jdbc-{java_client_version}-all.jar - official maven

Please visit our Spark Connector Compatibility Matrix docs to understand which versions suit your needs.

Add ClickHouse as a catalog {#add-clickhouse-as-catalog}

There are a variety of ways to add Spark configs to your session:

Custom configuration file to load with your session
Add configurations via Azure Synapse UI
Add configurations in your Synapse notebook

Follow this Manage Apache Spark configuration and add the connector required Spark configurations.

For instance, you can configure your Spark session in your notebook with these settings:

%%configure -f
{
    "conf": {
        "spark.sql.catalog.clickhouse": "com.clickhouse.spark.ClickHouseCatalog",
        "spark.sql.catalog.clickhouse.host": "<clickhouse host>",
        "spark.sql.catalog.clickhouse.protocol": "https",
        "spark.sql.catalog.clickhouse.http_port": "<port>",
        "spark.sql.catalog.clickhouse.user": "<username>",
        "spark.sql.catalog.clickhouse.password": "password",
        "spark.sql.catalog.clickhouse.database": "default"
    }
}

Make sure it will be in the first cell as follows:

Please visit the ClickHouse Spark configurations page for additional settings.

:::info When working with ClickHouse Cloud Please make sure to set the required Spark settings.
:::

Setup Verification {#setup-verification}

To verify that the dependencies and configurations were set successfully, please visit your session's Spark UI, and go to your Environment tab. There, look for your ClickHouse related settings:

Additional Resources {#additional-resources}

ClickHouse Spark Connector Docs
Azure Synapse Spark Pools Overview
Optimize performance for Apache Spark workloads
Manage libraries for Apache Spark pools in Synapse
Manage Apache Spark configuration in Synapse

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.md

index.md

Integrating Azure Synapse with ClickHouse

Add the connector's dependencies {#add-connector-dependencies}

Add ClickHouse as a catalog {#add-clickhouse-as-catalog}

Setup Verification {#setup-verification}

Additional Resources {#additional-resources}

Files

index.md

Latest commit

History

index.md

File metadata and controls

Integrating Azure Synapse with ClickHouse

Add the connector's dependencies {#add-connector-dependencies}

Add ClickHouse as a catalog {#add-clickhouse-as-catalog}

Setup Verification {#setup-verification}

Additional Resources {#additional-resources}