Near No-Downtime Upgrades: Discovery and Proposal for Upgrade Strategy, Postgres #89355

jennb33 · 2024-07-25T20:05:00Z

Issue Description

AWS offers near-zero downtime Redis upgrades. For reasons specific to our platform, we are currently unable to take advantage of this and an upgrade on our platform currently requires about two hours of downtime. The reasons for this mainly stem from the fact that we don't own all of the code in vets-api, meaning, we're unclear about the nature of the data in our clusters. Can it be reproduced in case data gets lost? Are writes to the cluster robust, in case the cluster is unavailable for any amount of time?

This has led us to upgrade paths that take the entire application stack offline, as documented for our most current upgrade

The goal of this ticket is to review what is needed for the next Redis Upgrade, and to provide a strategy to the OCTO POs for implementing that upgrade and meeting the Redis needs (not the wants) of OCTO via that upgrade.
Some research has been done on the Elasticache Clusters; it might be worth seeing if this is a priority to include in the next upgrade. Need to determine if this is a need or a want. Because the goal is zero-downtime, the clusters work might be a necessary inclusion.

Tasks

Determine what the functional needs are that Postgres is not currently meeting.
Determine what the technical needs are that Postgres is not currently meeting.
Determine what the Level of Effort will be to meet these needs
Create a proposal that includes implementation strategy for internal team review
Finalize proposal for presentation to OCTO POs

Success Metrics

Describe what success looks like for this work. Define specific, measurable outcomes that indicate success.

Acceptance Criteria

Proposal and implementation strategy for Postgres

Validation

Assignee to add steps to this section. List the actions that need to be taken to confirm this issue is complete. Include any necessary links or context. State the expected outcome.

jennb33 added backend devops practice area categorization -- NOT a team assignment discovery engineering Engineering topics needs-refinement Identifies tickets that need to be refined platform-product-team Redis labels Jul 25, 2024

jennb33 changed the title ~~Copy of No Downtime Upgrades: Discovery and Proposal for Upgrade Strategy~~ No Downtime Upgrades: Discovery and Proposal for Upgrade Strategy, Postgres Jul 25, 2024

jennb33 assigned Kshitiz-devops Jul 25, 2024

jennb33 mentioned this issue Aug 30, 2024

Near No-Downtime Upgrades / Sharded Upgrades - Discovery #84213

Open

12 tasks

jennb33 changed the title ~~No Downtime Upgrades: Discovery and Proposal for Upgrade Strategy, Postgres~~ Near No-Downtime Upgrades: Discovery and Proposal for Upgrade Strategy, Postgres Sep 10, 2024

AshleyGuerrant added the 2024 label Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Near No-Downtime Upgrades: Discovery and Proposal for Upgrade Strategy, Postgres #89355

Near No-Downtime Upgrades: Discovery and Proposal for Upgrade Strategy, Postgres #89355

jennb33 commented Jul 25, 2024 •

edited

Loading

Near No-Downtime Upgrades: Discovery and Proposal for Upgrade Strategy, Postgres #89355

Near No-Downtime Upgrades: Discovery and Proposal for Upgrade Strategy, Postgres #89355

Comments

jennb33 commented Jul 25, 2024 • edited Loading

Issue Description

Tasks

Success Metrics

Acceptance Criteria

Validation

jennb33 commented Jul 25, 2024 •

edited

Loading