-
Notifications
You must be signed in to change notification settings - Fork 89
Description
Description
PgDog is routing multi-row (multi-parameter) INSERT queries on sharded tables as ::All
when the sharding keys (e.g., user_id
) across rows hash to different shards. This results in the write being duplicated across all shards, which is not desired. Instead, I expect the rows to be grouped by their target shards and routed as ::Multi(Vec<int>)
, with the query potentially rewritten/split to send only relevant rows to each shard group (similar to how COPY commands are handled for cross-shard ingestion).
Context
- PgDog has three shard routing types:
::All
,::Direct(int)
,::Multi(Vec<int>)
. - For a single-row INSERT on a sharded key (e.g.,
user_id
), the query is correctly routed to the appropriate shard via hashing or the tenant list (::Direct
). - For multi-row INSERTs - like in the following example - where the
user_id
values hash to different shards, it's currently routing as::All
(broadcasting the full query to all shards, causing duplication).-- Routes to All (::All) INSERT INTO users (user_id, name) VALUES (123, 'Alice'), (456, 'Bob'), (789, 'Charlie');
Expected behavior
Group rows by target shard (e.g., rows for shard 0, rows for shard 1), then route as ::Multi
with separate INSERT queries sent to each relevant shard containing only its rows. This avoids duplication and ensures efficient writes.
If I have a multi-insert query, my expectation is that it would behave like COPY, splitting rows across shards without duplication.
Single Shard Example (Separate Single-Row INSERTs with ::Direct Routing)
Assuming hash-based sharding on user_id
across 3 shards (for illustration; actual shard depends on your hashing logic or tenant list):
-- Routes to Shard 0 (::Direct(0))
INSERT INTO users (user_id, name) VALUES (123, 'Alice');
-- Routes to Shard 1 (::Direct(1))
INSERT INTO users (user_id, name) VALUES (456, 'Bob');
-- Routes to Shard 2 (::Direct(1))
INSERT INTO users (user_id, name) VALUES (789, 'Charlie');
Grouped Insert Example (Multi-Row INSERT with ::All Routing, Causing Duplication)
If the multi-row INSERT is routed as ::All (broadcast to all 3 shards), it duplicates the writes across shards. A subsequent SELECT * FROM users ORDER BY user_id ASC;
(queried across all shards) would return tripled rows like this:
-- Routes to All (::All)
INSERT INTO users (user_id, name) VALUES
(123, 'Alice'),
(456, 'Bob'),
(789, 'Charlie');
-- Routes to All (::All)
SELECT * FROM users ORDER BY user_id ASC;
-- | user_id | name | shard | shard(user_id) |
-- |---------|---------|-------|----------------|
-- | 123 | Alice | 0 | ::Direct(0) |
-- | 123 | Alice | 1 | ::Direct(0) |
-- | 123 | Alice | 2 | ::Direct(0) |
-- | 456 | Bob | 0 | ::Direct(1) |
-- | 456 | Bob | 1 | ::Direct(1) |
-- | 456 | Bob | 2 | ::Direct(1) |
-- | 789 | Charlie | 0 | ::Direct(1) |
-- | 789 | Charlie | 1 | ::Direct(1) |
-- | 789 | Charlie | 2 | ::Direct(1) |
Reproduction Steps
- Set up PgDog with sharding enabled (e.g., hash-based on
user_id
across 3+ shards). - Execute a multi-row INSERT where sharding key values target multiple shards.
In Practice
PgDog might want to override and split a multi-row INSERT into per-shard multi-row INSERTs. The only difference from native Postgres is that those writes aren’t atomic across shards.
-- INSERT INTO users (user_id, name) VALUES
-- (123, 'Alice'),
-- (456, 'Bob'),
-- (789, 'Charlie');
-- Routes to Shard 0 (::Direct(0))
INSERT INTO users (user_id, name) VALUES
(123, 'Alice');
-- Routes to Shard 1 (::Direct(1))
INSERT INTO users (user_id, name) VALUES
(456, 'Bob'),
(789, 'Charlie');
Additional Notes
This seems similar to how cross-shard queries fall back to ::All
when no single sharding key is present, but for multi-row INSERTs with identifiable keys per row, splitting/grouping would be ideal to match COPY's behavior.