Skip to content

Route rows in a multi-row INSERT separately to different shards #259

@niclaflamme

Description

@niclaflamme

Description

PgDog is routing multi-row (multi-parameter) INSERT queries on sharded tables as ::All when the sharding keys (e.g., user_id) across rows hash to different shards. This results in the write being duplicated across all shards, which is not desired. Instead, I expect the rows to be grouped by their target shards and routed as ::Multi(Vec<int>), with the query potentially rewritten/split to send only relevant rows to each shard group (similar to how COPY commands are handled for cross-shard ingestion).

Context

  • PgDog has three shard routing types: ::All, ::Direct(int), ::Multi(Vec<int>).
  • For a single-row INSERT on a sharded key (e.g., user_id), the query is correctly routed to the appropriate shard via hashing or the tenant list (::Direct).
  • For multi-row INSERTs - like in the following example - where the user_id values hash to different shards, it's currently routing as ::All (broadcasting the full query to all shards, causing duplication).
    -- Routes to All (::All)
    INSERT INTO users (user_id, name) VALUES
      (123, 'Alice'),
      (456, 'Bob'),
      (789, 'Charlie');

Expected behavior

Group rows by target shard (e.g., rows for shard 0, rows for shard 1), then route as ::Multi with separate INSERT queries sent to each relevant shard containing only its rows. This avoids duplication and ensures efficient writes.

If I have a multi-insert query, my expectation is that it would behave like COPY, splitting rows across shards without duplication.

Single Shard Example (Separate Single-Row INSERTs with ::Direct Routing)

Assuming hash-based sharding on user_id across 3 shards (for illustration; actual shard depends on your hashing logic or tenant list):

-- Routes to Shard 0 (::Direct(0))
INSERT INTO users (user_id, name) VALUES (123, 'Alice');

-- Routes to Shard 1 (::Direct(1))
INSERT INTO users (user_id, name) VALUES (456, 'Bob');

-- Routes to Shard 2 (::Direct(1))
INSERT INTO users (user_id, name) VALUES (789, 'Charlie');

Grouped Insert Example (Multi-Row INSERT with ::All Routing, Causing Duplication)

If the multi-row INSERT is routed as ::All (broadcast to all 3 shards), it duplicates the writes across shards. A subsequent SELECT * FROM users ORDER BY user_id ASC; (queried across all shards) would return tripled rows like this:

-- Routes to All (::All)
INSERT INTO users (user_id, name) VALUES
  (123, 'Alice'),
  (456, 'Bob'),
  (789, 'Charlie');

-- Routes to All (::All)
SELECT * FROM users ORDER BY user_id ASC;

-- | user_id | name    | shard | shard(user_id) |
-- |---------|---------|-------|----------------|
-- | 123     | Alice   | 0     | ::Direct(0)    |
-- | 123     | Alice   | 1     | ::Direct(0)    |
-- | 123     | Alice   | 2     | ::Direct(0)    |
-- | 456     | Bob     | 0     | ::Direct(1)    |
-- | 456     | Bob     | 1     | ::Direct(1)    |
-- | 456     | Bob     | 2     | ::Direct(1)    |
-- | 789     | Charlie | 0     | ::Direct(1)    |
-- | 789     | Charlie | 1     | ::Direct(1)    |
-- | 789     | Charlie | 2     | ::Direct(1)    |

Reproduction Steps

  1. Set up PgDog with sharding enabled (e.g., hash-based on user_id across 3+ shards).
  2. Execute a multi-row INSERT where sharding key values target multiple shards.

In Practice

PgDog might want to override and split a multi-row INSERT into per-shard multi-row INSERTs. The only difference from native Postgres is that those writes aren’t atomic across shards.

-- INSERT INTO users (user_id, name) VALUES
--  (123, 'Alice'),
--  (456, 'Bob'),
--  (789, 'Charlie');

-- Routes to Shard 0 (::Direct(0))
INSERT INTO users (user_id, name) VALUES
  (123, 'Alice');

-- Routes to Shard 1 (::Direct(1))
INSERT INTO users (user_id, name) VALUES
  (456, 'Bob'),
  (789, 'Charlie');

Additional Notes

This seems similar to how cross-shard queries fall back to ::All when no single sharding key is present, but for multi-row INSERTs with identifiable keys per row, splitting/grouping would be ideal to match COPY's behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    backendFeature to manage backend connectionsbleeding edgeExciting new frontier stuff.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions