Issue 7887 Fix insert select planner to exclude identity columns from target list on partial inserts #7911

m3hm3t · 2025-02-25T18:53:17Z

This PR fixes an issue #7887

A potential approach to handling partial inserts on a distributed table with an identity column is to remove the unused identity column’s TargetEntry from the plan’s target list when the user omits it in the INSERT statement.

By eliminating the unnecessary identity column TargetEntry (or replacing it with a default), the final execution plan avoids processing that column in AppendCopyRowData. This prevents issues like the "invalid string enlargement request size: -4" error, which occurs when a misaligned identity column is incorrectly interpreted as text or contains invalid data.

Open the target relation (RelationIdGetRelation(targetRelationId)) to retrieve its tuple descriptor.
Loop over each TargetEntry in insertTargetList.
Check if the corresponding attribute (attr->attidentity) is 'a' or 'd' (ALWAYS or BY DEFAULT).
If the user didn’t explicitly provide a value for the identity column (!userProvidedValue), skip that TLE.
This effectively removes the identity column from the insert list, meaning Postgres will fill it from the sequence.
Replace the old insertTargetList with newTargetList that omits the identity TLE.

…t on partial inserts

colm-mchugh · 2025-02-26T16:22:17Z

src/backend/distributed/planner/insert_select_planner.c

+/* 
+	 * The insertTargetList are the columns we plan to insert into the target table. 
+	 * For partial inserts, it might incorrectly include the identity column if 
+	 * some rewriting logic added it. We'll fix that below.


I'm curious about where such rewriting logic takes place; is it done by Postgres (parse/analyze, standard_planner()) or Citus ?

src/backend/distributed/planner/distributed_planner.c PlannedStmt * distributed_planner(Query *parse, const char *query_string, int cursorOptions, ParamListInfo boundParams)

The targetList inside the Query object has a length of 3. It seems to me that the identity column, even if it was automatically generated, originates from PostgreSQL’s core logic.

Agreed, it is added by Postgres - and in the case of an identity column, the target expression is the next sequence value. If the user wants to explicitly set an identity column they need to override the default setting:

INSERT INTO local2(id, local1fk, reference1fk) OVERRIDING SYSTEM VALUE SELECT 333, id, 1 FROM local1;

In this case, the target expression for the identity column is a VAR node, referencing the value 333 in the SELECT query, which is similar to how the other targets (local1fk, reference1fk) refer to the source data (i.e. the to-be-inserted values).

I'm now unsure if removing the implicit identity target is the right way to go, because the following variations of the problem query do successfully complete:
INSERT INTO local2(local1fk, reference1fk) SELECT 'ccccc', 1
INSERT INTO local2(local1fk, reference1fk) SELECT * FROM (VALUES ('ccccc', 3), ('bbbbb', 2))
In both cases the target for the identity column is the same as the problem query - a call to the next sequence value. So maybe check why these queries can successfully complete the insert - particularly the execution logic, what is the difference between how it receives and handles the data in the problem case and these ok cases ?

colm-mchugh · 2025-02-26T16:30:16Z

src/backend/distributed/planner/insert_select_planner.c

+				 * indicated by something in the parse tree?), we remove or convert 
+				 * the TLE to a default. 
+				 */
+				bool userSpecifiedValue = CheckIfUserSpecifiedValue(tle, parse); 


Where is the definition of CheckIfUserSpecifiedValue ?

I'll implement it, but I'd like to confirm with you before moving forward with the implementation.

It may be indicated by something in the parse tree - comparing the parse tree for an INSERT statement with the identity column explicitly included against the parse tree for an INSERT statement with the identity column implicitly included (per the problem query) would help in determining how. Also, is tle->resjunk true for the identity column when its not explicitly included ? That may be a way to tell if a target should be removed.

@m3hm3t Check if INSERT INTO local2(local1fk, reference1fk) SELECT 'ccccc', 1 goes through CreateNonPushableInsertSelectPlan - it does not seem to hit the issue.

colm-mchugh · 2025-02-26T16:33:36Z

src/backend/distributed/planner/insert_select_planner.c

+					!userSpecifiedValue)
+				{
+					/* Skip adding TLE => effectively uses default identity generation */
+					continue; 


What is the value of tle->resjunk at this point ? Skipping TLE with resjunk = true may be a more general way to deal with the problem. But please check if that is actually the case here.

Nevermind - this is not the case so resjunk cannot be used to determine if a column was specified by the user.

colm-mchugh · 2025-02-26T16:37:46Z

src/test/regress/sql/issue_7887.sql

+
+-- If you want to see the error in the regression output, you might do something like:
+-- NOTE: The next line is typically how you'd test for an error in a .sql regression test
+-- but with a custom "expected" file you'll confirm you get the "invalid string enlargement request size: -4" text


nit: can we put test code into generated_identity.sql instead of creating a new test file ? Given that it already tests identity column in Citus,

m3hm3t added 2 commits February 25, 2025 10:15

Add regression test for issue 7887 related to invalid string enlargement

a21916f

Fix insert select planner to exclude identity columns from target lis…

70beadb

…t on partial inserts

m3hm3t self-assigned this Feb 25, 2025

colm-mchugh reviewed Feb 26, 2025

View reviewed changes

m3hm3t added the cherry-pick-13.0 label Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 7887 Fix insert select planner to exclude identity columns from target list on partial inserts #7911

Issue 7887 Fix insert select planner to exclude identity columns from target list on partial inserts #7911

m3hm3t commented Feb 25, 2025 •

edited

Loading

colm-mchugh Feb 26, 2025 •

edited

Loading

m3hm3t Feb 28, 2025

colm-mchugh Feb 28, 2025 •

edited

Loading

colm-mchugh Feb 26, 2025

m3hm3t Feb 27, 2025

colm-mchugh Feb 27, 2025

colm-mchugh Feb 27, 2025

colm-mchugh Feb 26, 2025

colm-mchugh Feb 28, 2025

colm-mchugh Feb 26, 2025

Issue 7887 Fix insert select planner to exclude identity columns from target list on partial inserts #7911

Are you sure you want to change the base?

Issue 7887 Fix insert select planner to exclude identity columns from target list on partial inserts #7911

Conversation

m3hm3t commented Feb 25, 2025 • edited Loading

colm-mchugh Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

colm-mchugh Feb 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m3hm3t commented Feb 25, 2025 •

edited

Loading

colm-mchugh Feb 26, 2025 •

edited

Loading

colm-mchugh Feb 28, 2025 •

edited

Loading