You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, when changes to sync rules or upcoming sync streams are deployed, PowerSync re-replicates all data from the source database from scratch, processing it with the new sync rules. Once that is ready, clients are switched over to sync from the new copy.
While there is no direct "downtime", it can take a long time on large databases, and clients have to re-sync all data even if only a small portion changed.
Status
2025-09-01: Basic approach is documented. We need to investigate the feasibility of the two different approaches in more detail.
Proposal
The base idea is to only reprocess bucket or sync stream definitions that have actually changed. This is on the definition-level - any change to any single query in a bucket definition would cause the entire bucket definition to be re-processed, and all related buckets to be re-synced.
As a core of this, we need to implement a "diff" between two versions of sync rules / sync stream definitions, specifically giving us:
Unchanged definitions.
New definitions.
Removed definitions.
Each modified definition is treated as a new definition + a removed definition - we do not perform any granular updating inside definitions. Generally, this means that if a query changed, that entire definition will be re-reprocessed and re-synced as new buckets - existing buckets are never updated.
This depends on versioned_bucket_ids as described here. Unchanged definitions will keep the old version id, while new definitions will get the new id.
Implementation Option 1
Here, we keep the current sync rules version active, while processing new bucket/stream definitions concurrently. Essentially we have:
Old stream: replicates (unchanged definitions, removed definitions).
New stream: replicates (new definitions).
One the replication stream for the new definitions have caught up:
Wait until both replication streams are at exactly the same position.
Stop replication.
Create a new replication stream, that combines the new definitions from the new sync rules, with the unchanged definitions from the old sync rules: (unchanged definitions, new definitions).
Drop data related to removed definitions.
Here, the most tricky part is to get the two replication streams "in sync" before replacing them with a single one again.
Implementation Option 2
This approach uses a single replication stream, and re-replicates existing data within it:
New definitions are added to the existing replication stream: Replicate all new data for (unchanged definitions, new definitions, removed definitions).
Concurrently, re-snapshot each table related to (new definitions). Here we need to be careful:
Make sure to not replace newer replicated data with older snapshot data.
Make sure to not introduce significant delays into the replication stream.
Do not update unchanged bucket data.
Remove (removed definitions) from the replication stream.
Drop data related to (removed definitions).
What makes this implementation particularly tricky is avoiding updating existing bucket data if unchanged: If we do trigger updates for those, it can cause clients to re-sync the data twice: Once on the old definitions, and again on the new definitions.
Other considerations
Defragmenting
Currently, the fact that data is fully reprocessed is used as a method for "defragmenting", as described here. If we implement the incremental reprocessing, we need alternative methods for defragmenting.
Config changes
Changes to replication config affect all bucket & stream definitions, so still requires re-replicating all data. For the most part it is very difficult to predict the effects of config changes on a more granular level.
However, if we avoid creating new operations for unchanged bucket data, we can avoid re-syncing data unaffected by config changes to clients.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
Currently, when changes to sync rules or upcoming sync streams are deployed, PowerSync re-replicates all data from the source database from scratch, processing it with the new sync rules. Once that is ready, clients are switched over to sync from the new copy.
While there is no direct "downtime", it can take a long time on large databases, and clients have to re-sync all data even if only a small portion changed.
Status
2025-09-01: Basic approach is documented. We need to investigate the feasibility of the two different approaches in more detail.
Proposal
The base idea is to only reprocess bucket or sync stream definitions that have actually changed. This is on the definition-level - any change to any single query in a bucket definition would cause the entire bucket definition to be re-processed, and all related buckets to be re-synced.
As a core of this, we need to implement a "diff" between two versions of sync rules / sync stream definitions, specifically giving us:
Each modified definition is treated as a new definition + a removed definition - we do not perform any granular updating inside definitions. Generally, this means that if a query changed, that entire definition will be re-reprocessed and re-synced as new buckets - existing buckets are never updated.
This depends on
versioned_bucket_ids
as described here. Unchanged definitions will keep the old version id, while new definitions will get the new id.Implementation Option 1
Here, we keep the current sync rules version active, while processing new bucket/stream definitions concurrently. Essentially we have:
One the replication stream for the new definitions have caught up:
Here, the most tricky part is to get the two replication streams "in sync" before replacing them with a single one again.
Implementation Option 2
This approach uses a single replication stream, and re-replicates existing data within it:
What makes this implementation particularly tricky is avoiding updating existing bucket data if unchanged: If we do trigger updates for those, it can cause clients to re-sync the data twice: Once on the old definitions, and again on the new definitions.
Other considerations
Defragmenting
Currently, the fact that data is fully reprocessed is used as a method for "defragmenting", as described here. If we implement the incremental reprocessing, we need alternative methods for defragmenting.
Config changes
Changes to replication config affect all bucket & stream definitions, so still requires re-replicating all data. For the most part it is very difficult to predict the effects of config changes on a more granular level.
However, if we avoid creating new operations for unchanged bucket data, we can avoid re-syncing data unaffected by config changes to clients.
Beta Was this translation helpful? Give feedback.
All reactions