-
Notifications
You must be signed in to change notification settings - Fork 40
Description
An Automatable Migration is one that can be setup to run in 'one-click'. Even when things don't go perfectly the first time, the migration system helps the user reach a migration that can be performed with one-click. Every required step is codified, leaving no ambiguity.
Background
Customers often require a high degree of confidence in their migration - specifically, that a source cluster's configurations and data have been transformed as necessary and moved to a target cluster. Customers most sensitive to disruptions of any kind will vet tools thoroughly and test in pre-prod environments. The migrations tooling can establish a process by providing a solid UX that's both accessible and easy to understand.
For OpenSearch clusters, like any other types of datastore, a migration can include a number of steps - some steps are independent for some applications and dependent for others. For the current opensearch migrations tools, some of the following steps are codependent and others are purely optional. For each of these steps though, there is an order amongst them. These steps include...
- Setting up a proxy to capture traffic for later replication to a target cluster.
- Creating a snapshot of metadata and data from the source cluster.
- Configuring the target cluster's configurations, possibly migrating configurations (metadata) from the source cluster.
- Backfilling existing/historical data to the target cluster. The RFS tool in the Migration Assistant repo supports pulling documents from source-cluster's snapshots, but users may have other mechanisms to replicate their data (e.g. push from S3).
- Replaying traffic to synchronize the data between the clusters and to validate the responses of the target cluster.
- Making the target cluster the primary cluster and moving all client traffic to it, leaving the source cluster for eventual decommissioning.
Since data being migrated may be large and accumulated over a long period of time, this process can be nuanced. It may require trial and error (especially around metadata setup, or choice of hardware). Customers may also need to complete the steps quickly to reduce downtime. That creates 3 stresses on users performing a migration...
- (Time) Migrations are long and users' patience will wear thin, giving them incentives to take shortcuts, even if they create future risks.
- (Execution) Users are prone to errors: "Was A executed before B?" When these mistakes happen, they might not be detected immediately. Mistakes may not be understood even after they've made an impact. Mistakes may not have clear remediations either.
- (Expertise) Novel migrations will include many misconfigurations. Users need to perform try-test loops repeatedly. Remembering the chain of actions and settings that were responsible for the current state is imprecise and mentally taxing.
Proposed Solution
The next evolution of the migration tools is the orchestration system that pulls the above concerns together into an easily managed, reproducible end-to-end system. The current migration-console control plane, which requires users to manually operate through each of the above steps will be replaced by a new workflow management system that performs each of the steps, as well as their setup, automatically for a user. Furthermore, users don't need to be as concerned with the 'how', but rather the 'what'.
The workflow system allows users to specify source and target cluster configurations as well as any rules and constraints. From that configuration, the workflow can begin running and the user can be alerted of any issues along the way. As issues arise, users make corrections and the workflow runs again. Each time the newly updated workflow runs, the migration working state and the target cluster are in consistent starting states to remove execution error. At each re-run, the system will preserve what work it can to reduce the overall time and reduce the pressure for operators to take short-cuts.
For a workflow system like this to be useful for all users, we'll need to evolve from the workflow above to include efficient experimentation loops where users can get rapid feedback. That will push tools to support new features, like sampling, incremental updates, and granular rollbacks. These don't need to be developed immediately. Having a system that codifies a user's migrations provides significant value even without those features. Without a workflow management system, those other features push the complexity of a migration past a breaking point for any human operator to handle.
See https://github.com/opensearch-project/opensearch-migrations/blob/main/docs/MigrationAsAWorkflow.md for more details, including the user-experience of the new control plane, and https://opensearch.atlassian.net/browse/MIGRATIONS-2506 to track the work.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status