Yet another export replicated partition pr #1124

arthurpassos · 2025-11-05T00:24:48Z

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Export partition support for replicated mt engines

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

Regression jobs to run:

…it file

… as one can at once

github-actions · 2025-11-05T00:25:49Z

Workflow [PR], commit [cf13ec2]

arthurpassos · 2025-11-06T15:39:20Z

Discussion from today:

Let's talk about the design:

Disclaimer: it does not need to be the most optimized one. Premature optimization killed us in the past, it'll kill us again.

<replica_zk_path>/exports <-- new path to store exports
    <export_key> <-- partition_id + destination_id, does not allow duplicates
        status <-- COMPLETED
        metadata.json <-- immutable stuff: t_id, p_id, destination, src_replica, parts_count, part_names, create_time, max_retries, ttl_seconds (Remember to add parallelism control here)
        processing <-- znode that holds pending, in_progress or failed parts.
            <part_1>
                status <-- PENDING or FAILED
                retry_count
                <finished_by> <-- it only exists for failed
            ...
            <part_n>
        processed <-- znode that holds parts that have been successfully completed
            <part_1>
                path <-- relative path in destination storage
                status <-- I guess I don't need this
                finished_by
        locks
            part_1 <-- replica1, ephemeral
            ...
            part_n <-- replican
        exceptions_per_replica
            <replica1> <-- created upon demand
                last_exception
                    part
                    exception <-- message
                count <--- problematic.. maybe it can be solved with a simple lock since it is exclusive to this replica
            ...
            <replica_n>
<replica_zk_path>/exports_cleanup_lock <-- ephemeral


talk about no need for CAS loops (actually we do need one for exceptions - I'll have to rethink this)
talk about the commit phase (moving last part to processed + checking if should commit)
talk about exporting and no longer holding a lock
talk about remove recursive operations (gather code evidence) <-- cover later
talk about system.replicated_partition_exports being expensive
talk about reducing communication with zk, keeping manifests local
talk about order based on create time


later discuss about kill operation
consider bumping retry count at pick up moment

Algorithm:

export_request:

sanity_check() // do we have destination table, partition id, task entry does not exist (unless force or ttl expired)
create_zk_export_structure() <-- transactional
/// note it does not trigger any export operation at this point, just puts it in zk

manifest_list_updating_task:

Load new entries
If we have the cleanup lock, also remove stale entries from zk and local
Upload dangling commit files if any
trigger scheduling task


scheduling_task:
loop over local manifests
    sanity_check // do we have destination table, is it still pending (zk calls)
    get_list_of_pending_parts (processing directory)
    get_list_of_locks

    loop over pending parts
        skip if we dont have the part
        skip if it is already locked (lock list is already local)

        try_to_lock
        try_to_schedule <-- we might need to optimize here

part_export_success_callback:
    check if we still own the lock, and grab the field version
    move it to processed with appropriate fields

    /// below is not transactional
    check if processing is empty (then we need to commit)
    ship commit file to s3
    mark entire task as completed

part_export_fail_callback:
    check if we still own the lock, grab the version
    grab retry_count from zk
    if retry_count + 1 >= max_retries
        set part under processing as failed
        fail the entire task
    increase retry_count
    populate exceptions_per_replica

cleanup_explain



recursivev looksup
kill



replicated_partition_exports

…to make tests a bit more stable, cache status

arthurpassos added 29 commits October 22, 2025 10:24

tmp

6097651

tmp2 - just in case of disaster recovery

43e9459

able to export partition using two different replicas and upload comm…

659b309

…it file

checkpoint

d3bb820

some changes

444e0ee

add a silly test

35c6cca

hold parts references to prevent deletion

b884fd3

fix a few tests

c7493cb

try to fix integ test failure and fix failure handling

f4f9d52

a few fixes

91c7ec2

make dest storage id part of the key

69cd83f

add system.replicated_partition_exports

62cb51f

add exception to replicated systems table

54c2dfb

add the replica that caused the exception

7b3a7c9

export_merge_tree_partition_force_export

3f3983c

almost done with kill export partition

b89cd5e

working kill export, update next idx upon lock and lock as many parts…

bb04fd9

… as one can at once

fix conflicts

55e7b94

rmv from system.exports

abe14f3

add no fasttest

e225798

some adjustments

0ca5e28

silly change to force cicd rebuild

63c48ce

remove kind of dead code

ee00ebb

small tweaks for demo

5c61bd6

todo comment

d609d04

wip - refactor new design

b571f5a

looking good before ordering

4487431

preserve order of tasks..

f23ed2d

Merge branch 'antalya-25.8' into export_replicated_mt_partition_v2

149a437

arthurpassos mentioned this pull request Nov 5, 2025

Export replicated merge tree partition to object storage #1097

Closed

25 tasks

arthurpassos changed the title ~~[DRAFT] Yet another export replicated partition pr~~ Yet another export replicated partition pr Nov 5, 2025

increase some sleeps to try to make the test more stable?

fb2d7f7

arthurpassos force-pushed the export_replicated_mt_partition_v2 branch from c768bca to fb2d7f7 Compare November 5, 2025 11:52

arthurpassos added 7 commits November 5, 2025 15:30

implement ttl that depends on cleanup, try to make tests more stable

ee2abd0

improve ttl so it does not depend on cleanup, add test for ttl

d6e4226

cleanup the code in updating task

c2b5d84

rmv useless change

601be4c

Cancel background exports instead of waiting for them upon drop table

d9796b4

Merge branch 'antalya-25.8' into export_replicated_mt_partition_v2

3f64bff

rmv bad ex log

26bd613

arthurpassos added 4 commits November 7, 2025 14:38

preserve some settings in export partition zk task

803a91a

make the zk structure a bit more flat

7a12b3a

introduce file already exists policy setting

de9deb2

use enum for statuses, remove status from completed part export, try …

cf13ec2

…to make tests a bit more stable, cache status

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Yet another export replicated partition pr #1124

Yet another export replicated partition pr #1124

Uh oh!

arthurpassos commented Nov 5, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 5, 2025 •

edited

Loading

Uh oh!

arthurpassos commented Nov 6, 2025 •

edited by filimonov

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Yet another export replicated partition pr #1124

Are you sure you want to change the base?

Yet another export replicated partition pr #1124

Uh oh!

Conversation

arthurpassos commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

CI/CD Options

Exclude tests:

Regression jobs to run:

Uh oh!

github-actions bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arthurpassos commented Nov 6, 2025 • edited by filimonov Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arthurpassos commented Nov 5, 2025 •

edited

Loading

github-actions bot commented Nov 5, 2025 •

edited

Loading

arthurpassos commented Nov 6, 2025 •

edited by filimonov

Loading