Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harden database commands against failures #7369

Open
onurctirtir opened this issue Nov 29, 2023 · 0 comments
Open

Harden database commands against failures #7369

onurctirtir opened this issue Nov 29, 2023 · 0 comments

Comments

@onurctirtir
Copy link
Member

onurctirtir commented Nov 29, 2023

Today, for database commands that cannot be executed in a transaction block (**), we don't do much next time when the command is invoked, if the prior invocation failed in the midway, e.g., if CREATE DATABASE succeeded on some nodes but not on all. We should do something better than this. For example, @thanodnl shared this as an idea on how we could handle failure scenarios for CREATE DATABASE better:

eg, could we use a trick where we create the database with a temporary name on all nodes and then transactionally rename the database
we could track the temporary names in a local catalog table for cleanup later if required etc.

@onurctirtir onurctirtir added this to the 12.3 Release milestone Nov 30, 2023
@onurctirtir onurctirtir self-assigned this Nov 30, 2023
onurctirtir added a commit that referenced this issue Feb 23, 2024
In preprocess phase, we save the original database name, replace
dbname field of CreatedbStmt with a temporary name (to let Postgres
to create the database with the temporary name locally) and then
we insert a cleanup record for the temporary database name on all
nodes **(\*\*)**.

And in postprocess phase, we first rename the temporary database
back to its original name for local node and then return a list of
distributed DDL jobs i) to create the database with the temporary
name and then ii) to rename it back to its original name on other
nodes. That way, if CREATE DATABASE fails on any of the nodes, the
temporary database will be cleaned up by the cleanup records that
we inserted in preprocess phase and in case of a failure, we won't
leak any databases called as the name that user intended to use for
the database.

Solves the problem documented in
#7369
for CREATE DATABASE commands.

**(\*\*):** To ensure that we insert cleanup records on all nodes,
with this PR we also start requiring having the coordinator in the
metadata because otherwise we would skip inserting a cleanup record
for the coordinator.
emelsimsek pushed a commit that referenced this issue Mar 11, 2024
In preprocess phase, we save the original database name, replace
dbname field of CreatedbStmt with a temporary name (to let Postgres
to create the database with the temporary name locally) and then
we insert a cleanup record for the temporary database name on all
nodes **(\*\*)**.

And in postprocess phase, we first rename the temporary database
back to its original name for local node and then return a list of
distributed DDL jobs i) to create the database with the temporary
name and then ii) to rename it back to its original name on other
nodes. That way, if CREATE DATABASE fails on any of the nodes, the
temporary database will be cleaned up by the cleanup records that
we inserted in preprocess phase and in case of a failure, we won't
leak any databases called as the name that user intended to use for
the database.

Solves the problem documented in
#7369
for CREATE DATABASE commands.

**(\*\*):** To ensure that we insert cleanup records on all nodes,
with this PR we also start requiring having the coordinator in the
metadata because otherwise we would skip inserting a cleanup record
for the coordinator.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants