Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
5fddc50
Swarm documentation
ianton-ru Nov 24, 2025
af3304d
Better English
ianton-ru Nov 24, 2025
ce72b84
Fix IN with iceberegCluster
ianton-ru Nov 25, 2025
09d938f
Try to fix flacky tests
ianton-ru Nov 26, 2025
02ebff1
Merge branch 'antalya-25.8' into bugfix/antalya-25.8/fix_in_for_clust…
ianton-ru Nov 26, 2025
39faf6e
Profile events for task distribution in ObjectStorageCluster requests
ianton-ru Nov 26, 2025
1f8c298
Fix ObjectStorageClusterWaitingMicroseconds
ianton-ru Nov 26, 2025
2092077
Fix spelling
ianton-ru Nov 26, 2025
649ef3a
Merge branch 'antalya-25.8' into feature/antalya-25.8/object_storage_…
Enmk Nov 26, 2025
cf1872f
Fix min/max value in Iceberg writes
ianton-ru Dec 5, 2025
ad27259
Merge branch 'antalya-25.8' into bugfix/antalya-25.8/fix_icebereg_wri…
Enmk Dec 8, 2025
44b9bed
Merge branch 'antalya-25.8' into feature/antalya-25.8/object_storage_…
zvonand Dec 8, 2025
cc35886
Merge branch 'antalya-25.8' into bugfix/antalya-25.8/fix_in_for_clust…
zvonand Dec 8, 2025
d904075
Fix XML tags and function names in icebergCluster.md
ianton-ru Dec 8, 2025
5fef745
Merge pull request #1192 from Altinity/bugfix/antalya-25.8/fix_iceber…
zvonand Dec 8, 2025
1620e0c
Merge pull request #1172 from Altinity/feature/antalya-25.8/object_st…
zvonand Dec 8, 2025
1a5d1d1
max msg size parquet reader v3
arthurpassos Dec 8, 2025
ae038bb
empty to retrigger ci
arthurpassos Dec 8, 2025
8b96e89
Merge branch 'antalya-25.8' into max_message_side_reader_v3
arthurpassos Dec 8, 2025
439b7e6
Fix spelling of 'algorithm' in documentation
zvonand Dec 9, 2025
1a1d51a
Merge pull request #1165 from Altinity/feature/antalya-25.8/docs
zvonand Dec 9, 2025
bba5b98
Merge pull request #1168 from Altinity/bugfix/antalya-25.8/fix_in_for…
zvonand Dec 9, 2025
d4439e5
Fix order in the test
novikd Oct 22, 2025
b69dde6
Merge pull request #1198 from Altinity/max_message_side_reader_v3
zvonand Dec 10, 2025
462ac2e
Merge pull request #90490 from ClickHouse/yarik/fix-todatetime-null-bug
yariks5s Dec 9, 2025
4619924
Fix segfault with undefined relevant_snapshot in Iceberg metadata
ianton-ru Dec 10, 2025
8370e98
Merge pull request #1211 from Altinity/bugfix/antalya-25.8/fix_segfau…
zvonand Dec 11, 2025
7ef4fce
Merge pull request #1203 from Altinity/bugfix/antalya-25.8/fix_03644_…
zvonand Dec 11, 2025
a86bd0f
Merge pull request #85128 from somratdutta/add-nessie-integration-tests
divanik Sep 17, 2025
a660606
Merge branch 'antalya-25.8' into backports/antalya-25.8/85128
mkmkme Dec 11, 2025
321b18f
Revert "Fix redundant host resolution in DDL Worker"
ianton-ru Dec 12, 2025
2264456
Merge pull request #1219 from Altinity/bugfix/antalya-25.8/revert_88153
zvonand Dec 13, 2025
bb1c5a4
Merge pull request #1213 from Altinity/backports/antalya-25.8/85128
zvonand Dec 14, 2025
6384aee
Merge branch 'antalya-25.8' into backports/antalya-25.8/90490
zvonand Dec 14, 2025
20ca39f
Merge pull request #1206 from Altinity/backports/antalya-25.8/90490
zvonand Dec 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions ci/docker/integration/runner/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ azure-core==1.30.1
azure-storage-blob==12.19.0
bcrypt==4.1.3
beautifulsoup4==4.12.3
boto3[all]==1.37.7
botocore==1.37.7
boto3==1.39.11
botocore==1.39.11
bs4==0.0.2
cassandra-driver==3.29.0
certifi==2025.4.26
Expand Down Expand Up @@ -97,7 +97,8 @@ redis==5.0.1
requests-kerberos==0.14.0
requests==2.32.4
rich==13.9.4
s3transfer==0.11.4
s3fs==2024.12.0
s3transfer==0.13.0
setuptools==78.1.1
simplejson==3.19.2
sortedcontainers==2.4.0
Expand Down
73 changes: 73 additions & 0 deletions docs/en/antalya/swarm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Antalya branch

## Swarm

### Difference with upstream version

#### `storage_type` argument in object storage functions

In upstream ClickHouse, there are several table functions to read Iceberg tables from different storage backends such as `icebergLocal`, `icebergS3`, `icebergAzure`, `icebergHDFS`, cluster variants, the `iceberg` function as a synonym for `icebergS3`, and table engines like `IcebergLocal`, `IcebergS3`, `IcebergAzure`, `IcebergHDFS`.

In the Antalya branch, the `iceberg` table function and the `Iceberg` table engine unify all variants into one by using a new named argument, `storage_type`, which can be one of `local`, `s3`, `azure`, or `hdfs`.

Old syntax examples:

```sql
SELECT * FROM icebergS3('http://minio1:9000/root/table_data', 'minio', 'minio123', 'Parquet');
SELECT * FROM icebergAzureCluster('mycluster', 'http://azurite1:30000/devstoreaccount1', 'cont', '/table_data', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'Parquet');
CREATE TABLE mytable ENGINE=IcebergHDFS('/table_data', 'Parquet');
```

New syntax examples:

```sql
SELECT * FROM iceberg(storage_type='s3', 'http://minio1:9000/root/table_data', 'minio', 'minio123', 'Parquet');
SELECT * FROM icebergCluster('mycluster', storage_type='azure', 'http://azurite1:30000/devstoreaccount1', 'cont', '/table_data', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'Parquet');
CREATE TABLE mytable ENGINE=Iceberg('/table_data', 'Parquet', storage_type='hdfs');
```

Also, if a named collection is used to store access parameters, the field `storage_type` can be included in the same named collection:

```xml
<named_collections>
<s3>
<url>http://minio1:9001/root/</url>
<access_key_id>minio</access_key_id>
<secret_access_key>minio123</secret_access_key>
<storage_type>s3</storage_type>
</s3>
</named_collections>
```

```sql
SELECT * FROM iceberg(s3, filename='table_data');
```

By default `storage_type` is `'s3'` to maintain backward compatibility.


#### `object_storage_cluster` setting

The new setting `object_storage_cluster` controls whether a single-node or cluster variant of table functions reading from object storage (e.g., `s3`, `azure`, `iceberg`, and their cluster variants like `s3Cluster`, `azureCluster`, `icebergCluster`) is used.

Old syntax examples:

```sql
SELECT * from s3Cluster('myCluster', 'http://minio1:9001/root/data/{clickhouse,database}/*', 'minio', 'minio123', 'CSV',
'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))');
SELECT * FROM icebergAzureCluster('mycluster', 'http://azurite1:30000/devstoreaccount1', 'cont', '/table_data', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'Parquet');
```

New syntax examples:

```sql
SELECT * from s3('http://minio1:9001/root/data/{clickhouse,database}/*', 'minio', 'minio123', 'CSV',
'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))')
SETTINGS object_storage_cluster='myCluster';
SELECT * FROM icebergAzure('http://azurite1:30000/devstoreaccount1', 'cont', '/table_data', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'Parquet')
SETTINGS object_storage_cluster='myCluster';
```

This setting also applies to table engines and can be used with tables managed by Iceberg Catalog.

Note: The upstream ClickHouse has introduced analogous settings, such as `parallel_replicas_for_cluster_engines` and `cluster_for_parallel_replicas`. Since version 25.10, these settings work with table engines. It is possible that in the future, the `object_storage_cluster` setting will be deprecated.
56 changes: 56 additions & 0 deletions docs/en/engines/table-engines/integrations/iceberg.md
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,62 @@ CREATE TABLE example_table ENGINE = Iceberg(

`Iceberg` table engine and table function support metadata cache storing the information of manifest files, manifest list and metadata json. The cache is stored in memory. This feature is controlled by setting `use_iceberg_metadata_files_cache`, which is enabled by default.

## Altinity Antalya branch

### Specify storage type in arguments

Only in the Altinity Antalya branch does `Iceberg` table engine support all storage types. The storage type can be specified using the named argument `storage_type`. Supported values are `s3`, `azure`, `hdfs`, and `local`.

```sql
CREATE TABLE iceberg_table_s3
ENGINE = Iceberg(storage_type='s3', url, [, NOSIGN | access_key_id, secret_access_key, [session_token]], format, [,compression])

CREATE TABLE iceberg_table_azure
ENGINE = Iceberg(storage_type='azure', connection_string|storage_account_url, container_name, blobpath, [account_name, account_key, format, compression])

CREATE TABLE iceberg_table_hdfs
ENGINE = Iceberg(storage_type='hdfs', path_to_table, [,format] [,compression_method])

CREATE TABLE iceberg_table_local
ENGINE = Iceberg(storage_type='local', path_to_table, [,format] [,compression_method])
```

### Specify storage type in named collection

Only in Altinity Antalya branch `storage_type` can be included as part of a named collection. This allows for centralized configuration of storage settings.

```xml
<clickhouse>
<named_collections>
<iceberg_conf>
<url>http://test.s3.amazonaws.com/clickhouse-bucket/</url>
<access_key_id>test<access_key_id>
<secret_access_key>test</secret_access_key>
<format>auto</format>
<structure>auto</structure>
<storage_type>s3</storage_type>
</iceberg_conf>
</named_collections>
</clickhouse>
```

```sql
CREATE TABLE iceberg_table ENGINE=Iceberg(iceberg_conf, filename = 'test_table')
```

The default value for `storage_type` is `s3`.

### The `object_storage_cluster` setting.

Only in the Altinity Antalya branch is an alternative syntax for the `Iceberg` table engine available. This syntax allows execution on a cluster when the `object_storage_cluster` setting is non-empty and contains the cluster name.

```sql
CREATE TABLE iceberg_table_s3
ENGINE = Iceberg(storage_type='s3', url, [, NOSIGN | access_key_id, secret_access_key, [session_token]], format, [,compression]);

SELECT * FROM iceberg_table_s3 SETTINGS object_storage_cluster='cluster_simple';
```

## See also {#see-also}

- [iceberg table function](/sql-reference/table-functions/iceberg.md)
23 changes: 23 additions & 0 deletions docs/en/sql-reference/distribution-on-cluster.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Task distribution in *Cluster family functions

## Task distribution algorithm

Table functions such as `s3Cluster`, `azureBlobStorageCluster`, `hdsfCluster`, `icebergCluster`, and table engines like `S3`, `Azure`, `HDFS`, `Iceberg` with the setting `object_storage_cluster` distribute tasks across all cluster nodes or a subset limited by the `object_storage_max_nodes` setting. This setting limits the number of nodes involved in processing a distributed query, randomly selecting nodes for each query.

A single task corresponds to processing one source file.

For each file, one cluster node is selected as the primary node using a consistent Rendezvous Hashing algorithm. This algorithm guarantees that:
* The same node is consistently selected as primary for each file, as long as the cluster remains unchanged.
* When the cluster changes (nodes added or removed), only files assigned to those affected nodes change their primary node assignment.

This improves cache efficiency by minimizing data movement among nodes.

## `lock_object_storage_task_distribution_ms` setting

Each node begins processing files for which it is the primary node. After completing its assigned files, a node may take tasks from other nodes, either immediately or after waiting for `lock_object_storage_task_distribution_ms` milliseconds if the primary node does not request new files during that interval. The default value of `lock_object_storage_task_distribution_ms` is 500 milliseconds. This setting balances between caching efficiency and workload redistribution when nodes are imbalanced.

## `SYSTEM STOP SWARM MODE` command

If a node needs to shut down gracefully, the command `SYSTEM STOP SWARM MODE` prevents the node from receiving new tasks for *Cluster-family queries. The node finishes processing already assigned files before it can safely shut down without errors.

Receiving new tasks can be resumed with the command `SYSTEM START SWARM MODE`.
14 changes: 14 additions & 0 deletions docs/en/sql-reference/table-functions/azureBlobStorageCluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,20 @@ SELECT count(*) FROM azureBlobStorageCluster(

See [azureBlobStorage](/sql-reference/table-functions/azureBlobStorage#using-shared-access-signatures-sas-sas-tokens) for examples.

## Altinity Antalya branch

### `object_storage_cluster` setting.

Only in the Altinity Antalya branch, the alternative syntax for the `azureBlobStorageCluster` table function is avilable. This allows the `azureBlobStorage` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Azure Blob Storage across a ClickHouse cluster.

```sql
SELECT count(*) FROM azureBlobStorage(
'http://azurite1:10000/devstoreaccount1', 'testcontainer', 'test_cluster_count.csv', 'devstoreaccount1',
'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'CSV',
'auto', 'key UInt64')
SETTINGS object_storage_cluster='cluster_simple'
```

## Related {#related}

- [AzureBlobStorage engine](../../engines/table-engines/integrations/azureBlobStorage.md)
Expand Down
11 changes: 11 additions & 0 deletions docs/en/sql-reference/table-functions/deltalakeCluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,17 @@ A table with the specified structure for reading data from cluster in the specif
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
- `_etag` — The etag of the file. Type: `LowCardinality(String)`. If the etag is unknown, the value is `NULL`.

## Altinity Antalya branch

### `object_storage_cluster` setting.

Only in the Altinity Antalya branch alternative syntax for `deltaLakeCluster` table function is available. This allows the `deltaLake` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Delta Lake Storage across a ClickHouse cluster.

```sql
SELECT count(*) FROM deltaLake(url [,aws_access_key_id, aws_secret_access_key] [,format] [,structure] [,compression])
SETTINGS object_storage_cluster='cluster_simple'
```

## Related {#related}

- [deltaLake engine](engines/table-engines/integrations/deltalake.md)
Expand Down
12 changes: 12 additions & 0 deletions docs/en/sql-reference/table-functions/hdfsCluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,18 @@ FROM hdfsCluster('cluster_simple', 'hdfs://hdfs1:9000/{some,another}_dir/*', 'TS
If your listing of files contains number ranges with leading zeros, use the construction with braces for each digit separately or use `?`.
:::

## Altinity Antalya branch

### `object_storage_cluster` setting.

Only in the Altinity Antalya branch alternative syntax for `hdfsCluster` table function is available. This allows the `hdfs` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over HDFS Storage across a ClickHouse cluster.

```sql
SELECT count(*)
FROM hdfs('hdfs://hdfs1:9000/{some,another}_dir/*', 'TSV', 'name String, value UInt32')
SETTINGS object_storage_cluster='cluster_simple'
```

## Related {#related}

- [HDFS engine](../../engines/table-engines/integrations/hdfs.md)
Expand Down
12 changes: 12 additions & 0 deletions docs/en/sql-reference/table-functions/hudiCluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,18 @@ A table with the specified structure for reading data from cluster in the specif
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
- `_etag` — The etag of the file. Type: `LowCardinality(String)`. If the etag is unknown, the value is `NULL`.

## Altinity Antalya branch

### `object_storage_cluster` setting.

Only in the Altinity Antalya branch alternative syntax for `hudiCluster` table function is available. This allows the `hudi` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Hudi Storage across a ClickHouse cluster.

```sql
SELECT *
FROM hudi(url [,aws_access_key_id, aws_secret_access_key] [,format] [,structure] [,compression])
SETTINGS object_storage_cluster='cluster_simple'
```

## Related {#related}

- [Hudi engine](engines/table-engines/integrations/hudi.md)
Expand Down
41 changes: 41 additions & 0 deletions docs/en/sql-reference/table-functions/iceberg.md
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,47 @@ Table function `iceberg` is an alias to `icebergS3` now.
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
- `_etag` — The etag of the file. Type: `LowCardinality(String)`. If the etag is unknown, the value is `NULL`.

## Altinity Antalya branch

### Specify storage type in arguments

Only in the Altinity Antalya branch does the `iceberg` table function support all storage types. The storage type can be specified using the named argument `storage_type`. Supported values are `s3`, `azure`, `hdfs`, and `local`.

```sql
iceberg(storage_type='s3', url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,compression_method])

iceberg(storage_type='azure', connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method])

iceberg(storage_type='hdfs', path_to_table, [,format] [,compression_method])

iceberg(storage_type='local', path_to_table, [,format] [,compression_method])
```

### Specify storage type in named collection

Only in the Altinity Antalya branch can storage_type be included as part of a named collection. This allows for centralized configuration of storage settings.

```xml
<clickhouse>
<named_collections>
<iceberg_conf>
<url>http://test.s3.amazonaws.com/clickhouse-bucket/</url>
<access_key_id>test<access_key_id>
<secret_access_key>test</secret_access_key>
<format>auto</format>
<structure>auto</structure>
<storage_type>s3</storage_type>
</iceberg_conf>
</named_collections>
</clickhouse>
```

```sql
iceberg(named_collection[, option=value [,..]])
```

The default value for `storage_type` is `s3`.

## See Also {#see-also}

* [Iceberg engine](/engines/table-engines/integrations/iceberg.md)
Expand Down
75 changes: 75 additions & 0 deletions docs/en/sql-reference/table-functions/icebergCluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,81 @@ SELECT * FROM icebergS3Cluster('cluster_simple', 'http://test.s3.amazonaws.com/c
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
- `_etag` — The etag of the file. Type: `LowCardinality(String)`. If the etag is unknown, the value is `NULL`.

## Altinity Antalya branch

### `icebergLocalCluster` table function

Only in the Altinity Antalya branch, `icebergLocalCluster` designed to make distributed cluster queries when Iceberg data is stored on shared network storage mounted with a local path. The path must be identical on all replicas.

```sql
icebergLocalCluster(cluster_name, path_to_table, [,format] [,compression_method])
```

### Specify storage type in function arguments

Only in the Altinity Antalya branch, the `icebergCluster` table function supports all storage backends. The storage backend can be specified using the named argument `storage_type`. Valid values include `s3`, `azure`, `hdfs`, and `local`.

```sql
icebergCluster(storage_type='s3', cluster_name, url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,compression_method])

icebergCluster(storage_type='azure', cluster_name, connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method])

icebergCluster(storage_type='hdfs', cluster_name, path_to_table, [,format] [,compression_method])

icebergCluster(storage_type='local', cluster_name, path_to_table, [,format] [,compression_method])
```

### Specify storage type in a named collection

Only in the Altinity Antalya branch, `storage_type` can be part of a named collection.

```xml
<clickhouse>
<named_collections>
<iceberg_conf>
<url>http://test.s3.amazonaws.com/clickhouse-bucket/</url>
<access_key_id>test</access_key_id>
<secret_access_key>test</secret_access_key>
<format>auto</format>
<structure>auto</structure>
<storage_type>s3</storage_type>
</iceberg_conf>
</named_collections>
</clickhouse>
```

```sql
icebergCluster(iceberg_conf[, option=value [,..]])
```

The default value for `storage_type` is `s3`.

### `object_storage_cluster` setting.

Only in the Altinity Antalya branch, an alternative syntax for `icebergCluster` table function is available. This allows the `iceberg` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Iceberg table across a ClickHouse cluster.

```sql
icebergS3(url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name'

icebergAzure(connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name'

icebergHDFS(path_to_table, [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name'

icebergLocal(path_to_table, [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name'

icebergS3(option=value [,..]) SETTINGS object_storage_cluster='cluster_name'

iceberg(storage_type='s3', url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name'

iceberg(storage_type='azure', connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name'

iceberg(storage_type='hdfs', path_to_table, [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name'

iceberg(storage_type='local', path_to_table, [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name'

iceberg(iceberg_conf[, option=value [,..]]) SETTINGS object_storage_cluster='cluster_name'
```

**See Also**

- [Iceberg engine](/engines/table-engines/integrations/iceberg.md)
Expand Down
Loading
Loading