Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CREATE / DROP database commands from any node #7359

Merged
merged 2 commits into from
Jan 8, 2024

Conversation

onurctirtir
Copy link
Member

@onurctirtir onurctirtir commented Nov 21, 2023

DESCRIPTION: Adds support for issuing CREATE/DROP DATABASE commands from worker nodes

With this commit, we allow issuing CREATE / DROP DATABASE commands from worker nodes too.
As in #7278, this is not allowed when the coordinator is not added to metadata because
we don't ever sync metadata changes to coordinator when adding coordinator to
the metadata via SELECT citus_set_coordinator_host('<hostname>'), or equivalently, via
SELECT citus_add_node(<coordinator_node_name>, <coordinator_node_port>, 0).

We serialize database management commands by acquiring a Citus specific advisory lock
on the first primary worker node if there are any workers in the cluster. As opposed to
what we've done in #7278 for role management
commands, we try to avoid from running into distributed deadlocks as much as possible.
This is because, while distributed deadlocks that can happen around role management
commands can be detected by Citus, this is not the case for database management commands
because most of them cannot be run inside in a transaction block. In that case, Citus
cannot even detect the distributed deadlock because the command is not part of a
distributed transaction at all, then the command execution might not return the control
back to the user for an indefinite amount of time.

@onurctirtir onurctirtir force-pushed the create-drop-db-from-any-node branch from d85b211 to f7f5bdf Compare November 21, 2023 15:17
@onurctirtir onurctirtir marked this pull request as ready for review November 21, 2023 15:18
@onurctirtir onurctirtir force-pushed the create-drop-db-from-any-node branch 2 times, most recently from c160615 to 0d51f1b Compare November 21, 2023 15:45
Copy link
Member Author

@onurctirtir onurctirtir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we have a strange situation where we can't even detect the deadlock?

select citus_add_node('localhost', 9700, 0);
select run_command_on_all_nodes('alter system set citus.enable_create_database_propagation to on');
select run_command_on_all_nodes('select pg_reload_conf()');
select run_command_on_all_nodes('create database onur');

Or equivalently, can use the following script too in order to get rid of the call made to run_command_on_all_nodes():

psql -p 9700 -c "SELECT citus_add_node('localhost', 9700, 0)"
psql -p 9700 -c "SELECT run_command_on_all_nodes('alter system set citus.enable_create_database_propagation to on')"
psql -p 9700 -c "SELECT run_command_on_all_nodes('select pg_reload_conf()')"

dbname="$1"

pids=()

for i in `seq 0 2`; do
    psql -p 970${i} -c "create database $dbname" &
    pids+=($!)
done

for pid in ${pids[*]}; do
    wait $pid
done

@citusdata citusdata deleted a comment from codecov bot Nov 21, 2023
Copy link

codecov bot commented Nov 21, 2023

Codecov Report

Merging #7359 (11feabd) into main (20dc58c) will decrease coverage by 0.06%.
The diff coverage is 81.13%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7359      +/-   ##
==========================================
- Coverage   89.61%   89.55%   -0.06%     
==========================================
  Files         278      280       +2     
  Lines       60204    60303      +99     
  Branches     7496     7505       +9     
==========================================
+ Hits        53953    54007      +54     
- Misses       4104     4139      +35     
- Partials     2147     2157      +10     

@gurkanindibay
Copy link
Contributor

A good description is needed. AFAIU, Removing coordinator limits the propagation. It should be clearly explained in PR description.

@onurctirtir
Copy link
Member Author

Hmm, we have a strange situation where we can't even detect the deadlock?

select citus_add_node('localhost', 9700, 0);
select run_command_on_all_nodes('alter system set citus.enable_create_database_propagation to on');
select run_command_on_all_nodes('select pg_reload_conf()');
select run_command_on_all_nodes('create database onur');

Or equivalently, can use the following script too in order to get rid of the call made to run_command_on_all_nodes():

psql -p 9700 -c "SELECT citus_add_node('localhost', 9700, 0)"
psql -p 9700 -c "SELECT run_command_on_all_nodes('alter system set citus.enable_create_database_propagation to on')"
psql -p 9700 -c "SELECT run_command_on_all_nodes('select pg_reload_conf()')"

dbname="$1"

pids=()

for i in `seq 0 2`; do
    psql -p 970${i} -c "create database $dbname" &
    pids+=($!)
done

for pid in ${pids[*]}; do
    wait $pid
done

First of all, deadlock detection code doesn't care about the queries
that are not part of a distributed transaction. See the occurrences of
bool onlyDistributedTx = true in distributed_deadlock_detection.c

Setting onlyDistributedTx to true doesn't solve this problem because
AddWaitEdge() then cannot track initiator node for a backend that is not
part of a distributed transaction. This is because, we call
assign_distributed_transaction_id() only for the backends that are part
of a distributed transaction and AddWaitEdge() deduces the initiator
node thanks what assign_distributed_transaction_id() saves in backend data.

And when this info is not provided, distributed deadlock detector cannot
even associate a remote backend to its originating node and cannot detect
the cycles, hence cannot detect the distributed deadlock.

@onurctirtir onurctirtir force-pushed the create-drop-db-from-any-node branch 3 times, most recently from 257f81b to 52d9ccf Compare November 30, 2023 09:38
Copy link
Member Author

@onurctirtir onurctirtir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@thanodnl thanodnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overal

@onurctirtir onurctirtir force-pushed the create-drop-db-from-any-node branch from 7a19fe9 to 7a5fb97 Compare December 28, 2023 09:32
Comment on lines +942 to +952
DETAIL: Citus does not propagate CREATE DATABASE command to other nodes
HINT: You can manually create a database and its extensions on other nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidenote: Should we update this hint message to include a hint about citus.enable_create_database_propagation? In any case this would be better to do in a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, let me update this in a separate PR

Copy link
Member Author

@onurctirtir onurctirtir Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#7408, assigned to @gurkanindibay

@onurctirtir onurctirtir force-pushed the create-drop-db-from-any-node branch from d307d3f to c029ac8 Compare December 29, 2023 09:32
@onurctirtir onurctirtir changed the base branch from main to fix-non-dist-db-cases December 29, 2023 09:33
@onurctirtir onurctirtir force-pushed the fix-non-dist-db-cases branch 3 times, most recently from 5e99f23 to b760a88 Compare December 29, 2023 12:51
@onurctirtir
Copy link
Member Author

onurctirtir commented Dec 29, 2023

TODO:

@onurctirtir onurctirtir force-pushed the fix-non-dist-db-cases branch from b760a88 to 108c559 Compare December 29, 2023 14:03
@onurctirtir onurctirtir force-pushed the create-drop-db-from-any-node branch from c029ac8 to f67448b Compare December 29, 2023 14:04
Base automatically changed from fix-non-dist-db-cases to main January 3, 2024 14:03
@onurctirtir onurctirtir force-pushed the create-drop-db-from-any-node branch from 32a3a9d to 0403a6d Compare January 3, 2024 14:08
@onurctirtir onurctirtir requested a review from JelteF January 3, 2024 14:38
@onurctirtir onurctirtir force-pushed the create-drop-db-from-any-node branch from 0403a6d to dccac1f Compare January 5, 2024 09:41
Comment on lines 1220 to 1233
ERROR: permission denied to create / rename database
CONTEXT: while executing command on localhost:xxxxx
ALTER DATABASE distributed_db RENAME TO rename_test;
ERROR: permission denied to create / rename database
CONTEXT: while executing command on localhost:xxxxx
DROP DATABASE distributed_db;
ERROR: must be owner of database distributed_db
CONTEXT: while executing command on localhost:xxxxx
ALTER DATABASE distributed_db SET TABLESPACE pg_default;
ERROR: must be owner of database distributed_db
CONTEXT: while executing command on localhost:xxxxx
ALTER DATABASE distributed_db SET timezone TO 'UTC';
ERROR: must be owner of database distributed_db
CONTEXT: while executing command on localhost:xxxxx
ALTER DATABASE distributed_db RESET timezone;
ERROR: must be owner of database distributed_db
CONTEXT: while executing command on localhost:xxxxx
GRANT ALL ON DATABASE distributed_db TO postgres;
WARNING: no privileges were granted for "distributed_db"
RESET ROLE;
Copy link
Contributor

@JelteF JelteF Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super-nit: It would be nice if these errors did not come from another node, but instead the privileges were checked before sending commands there. Now users would get this "confusing" CONTEXT: while executing command on localhost:xxxxx

Probably not important enough to do in this PR. So let's make an issue, seems like a "good first issue".

PS. even when doing this, we'd still want these errors too in case attackers try to call these functions manually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me quickly fix that in this PR while we're at it

Copy link
Contributor

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from removing (see other comment for details):

case OCLASS_SUBSCRIPTION:

And a few other very minor things I think this is good to merge.

@onurctirtir onurctirtir force-pushed the create-drop-db-from-any-node branch from 95dd250 to 11feabd Compare January 8, 2024 16:32
@onurctirtir onurctirtir enabled auto-merge (squash) January 8, 2024 16:45
@onurctirtir onurctirtir merged commit 1d55deb into main Jan 8, 2024
156 checks passed
@onurctirtir onurctirtir deleted the create-drop-db-from-any-node branch January 8, 2024 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants