[Proposal] Incremental reprocessing #349

rkistner · 2025-09-01T13:27:12Z

rkistner
Sep 1, 2025
Maintainer

Background

Currently, when changes to sync rules or upcoming sync streams are deployed, PowerSync re-replicates all data from the source database from scratch, processing it with the new sync rules. Once that is ready, clients are switched over to sync from the new copy.

While there is no direct "downtime", it can take a long time on large databases, and clients have to re-sync all data even if only a small portion changed.

Status

2025-09-01: Basic approach is documented. We need to investigate the feasibility of the two different approaches in more detail.

Proposal

The base idea is to only reprocess bucket or sync stream definitions that have actually changed. This is on the definition-level - any change to any single query in a bucket definition would cause the entire bucket definition to be re-processed, and all related buckets to be re-synced.

As a core of this, we need to implement a "diff" between two versions of sync rules / sync stream definitions, specifically giving us:

Unchanged definitions.
New definitions.
Removed definitions.

Each modified definition is treated as a new definition + a removed definition - we do not perform any granular updating inside definitions. Generally, this means that if a query changed, that entire definition will be re-reprocessed and re-synced as new buckets - existing buckets are never updated.

This depends on versioned_bucket_ids as described here. Unchanged definitions will keep the old version id, while new definitions will get the new id.

Implementation Option 1

Here, we keep the current sync rules version active, while processing new bucket/stream definitions concurrently. Essentially we have:

Old stream: replicates (unchanged definitions, removed definitions).
New stream: replicates (new definitions).

One the replication stream for the new definitions have caught up:

Wait until both replication streams are at exactly the same position.
Stop replication.
Create a new replication stream, that combines the new definitions from the new sync rules, with the unchanged definitions from the old sync rules: (unchanged definitions, new definitions).
Drop data related to removed definitions.

Here, the most tricky part is to get the two replication streams "in sync" before replacing them with a single one again.

Implementation Option 2

This approach uses a single replication stream, and re-replicates existing data within it:

New definitions are added to the existing replication stream: Replicate all new data for (unchanged definitions, new definitions, removed definitions).
Concurrently, re-snapshot each table related to (new definitions). Here we need to be careful:
1. Make sure to not replace newer replicated data with older snapshot data.
2. Make sure to not introduce significant delays into the replication stream.
3. Do not update unchanged bucket data.
Remove (removed definitions) from the replication stream.
Drop data related to (removed definitions).

What makes this implementation particularly tricky is avoiding updating existing bucket data if unchanged: If we do trigger updates for those, it can cause clients to re-sync the data twice: Once on the old definitions, and again on the new definitions.

Other considerations

Defragmenting

Currently, the fact that data is fully reprocessed is used as a method for "defragmenting", as described here. If we implement the incremental reprocessing, we need alternative methods for defragmenting.

Config changes

Changes to replication config affect all bucket & stream definitions, so still requires re-replicating all data. For the most part it is very difficult to predict the effects of config changes on a more granular level.

However, if we avoid creating new operations for unchanged bucket data, we can avoid re-syncing data unaffected by config changes to clients.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Proposal] Incremental reprocessing #349

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[Proposal] Incremental reprocessing #349

Uh oh!

Uh oh!

rkistner Sep 1, 2025 Maintainer

Background

Status

Proposal

Implementation Option 1

Implementation Option 2

Other considerations

Defragmenting

Config changes

Replies: 0 comments

rkistner
Sep 1, 2025
Maintainer