Near No-Downtime Upgrades: Discovery and Proposal for Upgrade Strategy, Postgres #89355
Labels
2024
backend
devops
practice area categorization -- NOT a team assignment
discovery
engineering
Engineering topics
needs-refinement
Identifies tickets that need to be refined
platform-product-team
Redis
Issue Description
AWS offers near-zero downtime Redis upgrades. For reasons specific to our platform, we are currently unable to take advantage of this and an upgrade on our platform currently requires about two hours of downtime. The reasons for this mainly stem from the fact that we don't own all of the code in vets-api, meaning, we're unclear about the nature of the data in our clusters. Can it be reproduced in case data gets lost? Are writes to the cluster robust, in case the cluster is unavailable for any amount of time?
This has led us to upgrade paths that take the entire application stack offline, as documented for our most current upgrade
The goal of this ticket is to review what is needed for the next Redis Upgrade, and to provide a strategy to the OCTO POs for implementing that upgrade and meeting the Redis needs (not the wants) of OCTO via that upgrade.
Some research has been done on the Elasticache Clusters; it might be worth seeing if this is a priority to include in the next upgrade. Need to determine if this is a need or a want. Because the goal is zero-downtime, the clusters work might be a necessary inclusion.
Tasks
Success Metrics
Describe what success looks like for this work. Define specific, measurable outcomes that indicate success.
Acceptance Criteria
Validation
Assignee to add steps to this section. List the actions that need to be taken to confirm this issue is complete. Include any necessary links or context. State the expected outcome.
The text was updated successfully, but these errors were encountered: