Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Near No-Downtime Upgrades: Discovery and Proposal for Upgrade Strategy, Postgres #89355

Open
6 tasks
Tracked by #84213
jennb33 opened this issue Jul 25, 2024 · 0 comments
Open
6 tasks
Tracked by #84213
Assignees
Labels
2024 backend devops practice area categorization -- NOT a team assignment discovery engineering Engineering topics needs-refinement Identifies tickets that need to be refined platform-product-team Redis

Comments

@jennb33
Copy link
Contributor

jennb33 commented Jul 25, 2024

Issue Description

AWS offers near-zero downtime Redis upgrades. For reasons specific to our platform, we are currently unable to take advantage of this and an upgrade on our platform currently requires about two hours of downtime. The reasons for this mainly stem from the fact that we don't own all of the code in vets-api, meaning, we're unclear about the nature of the data in our clusters. Can it be reproduced in case data gets lost? Are writes to the cluster robust, in case the cluster is unavailable for any amount of time?

This has led us to upgrade paths that take the entire application stack offline, as documented for our most current upgrade

The goal of this ticket is to review what is needed for the next Redis Upgrade, and to provide a strategy to the OCTO POs for implementing that upgrade and meeting the Redis needs (not the wants) of OCTO via that upgrade.
Some research has been done on the Elasticache Clusters; it might be worth seeing if this is a priority to include in the next upgrade. Need to determine if this is a need or a want. Because the goal is zero-downtime, the clusters work might be a necessary inclusion.

Tasks

  • Determine what the functional needs are that Postgres is not currently meeting.
  • Determine what the technical needs are that Postgres is not currently meeting.
  • Determine what the Level of Effort will be to meet these needs
  • Create a proposal that includes implementation strategy for internal team review
  • Finalize proposal for presentation to OCTO POs

Success Metrics

Describe what success looks like for this work. Define specific, measurable outcomes that indicate success.

Acceptance Criteria

  • Proposal and implementation strategy for Postgres

Validation

Assignee to add steps to this section. List the actions that need to be taken to confirm this issue is complete. Include any necessary links or context. State the expected outcome.

@jennb33 jennb33 added backend devops practice area categorization -- NOT a team assignment discovery engineering Engineering topics needs-refinement Identifies tickets that need to be refined platform-product-team Redis labels Jul 25, 2024
@jennb33 jennb33 changed the title Copy of No Downtime Upgrades: Discovery and Proposal for Upgrade Strategy No Downtime Upgrades: Discovery and Proposal for Upgrade Strategy, Postgres Jul 25, 2024
@jennb33 jennb33 changed the title No Downtime Upgrades: Discovery and Proposal for Upgrade Strategy, Postgres Near No-Downtime Upgrades: Discovery and Proposal for Upgrade Strategy, Postgres Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024 backend devops practice area categorization -- NOT a team assignment discovery engineering Engineering topics needs-refinement Identifies tickets that need to be refined platform-product-team Redis
Projects
None yet
Development

No branches or pull requests

3 participants