-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 7887 Fix insert select planner to exclude identity columns from target list on partial inserts #7911
base: release-13.0
Are you sure you want to change the base?
Issue 7887 Fix insert select planner to exclude identity columns from target list on partial inserts #7911
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1435,6 +1435,62 @@ CreateNonPushableInsertSelectPlan(uint64 planId, Query *parse, ParamListInfo bou | |
|
||
PrepareInsertSelectForCitusPlanner(insertSelectQuery); | ||
|
||
/* | ||
* The insertTargetList are the columns we plan to insert into the target table. | ||
* For partial inserts, it might incorrectly include the identity column if | ||
* some rewriting logic added it. We'll fix that below. | ||
*/ | ||
List *insertTargetList = insertSelectQuery->targetList; | ||
|
||
/* | ||
* 1) Open the target relation to inspect its attributes and detect identity columns. | ||
*/ | ||
Relation targetRel = RelationIdGetRelation(targetRelationId); | ||
if (RelationIsValid(targetRel)) | ||
{ | ||
/* We'll build a new list of TLEs that excludes identity columns if user omitted them. */ | ||
List *newTargetList = NIL; | ||
ListCell *lc = NULL; | ||
|
||
foreach(lc, insertTargetList) | ||
{ | ||
TargetEntry *tle = (TargetEntry *) lfirst(lc); | ||
|
||
/* | ||
* resno is 1-based attribute number: if we have 3 columns in table, they | ||
* correspond to resno=1..3. Make sure attno is in range before we do anything. | ||
*/ | ||
int attno = tle->resno; | ||
if (attno > 0 && attno <= targetRel->rd_att->natts) | ||
{ | ||
Form_pg_attribute attr = TupleDescAttr(targetRel->rd_att, attno - 1); | ||
|
||
/* | ||
* If 'attr->attidentity' is 'a' or 'd' => It's an identity column. | ||
* If the user hasn't explicitly specified a value (which is presumably | ||
* indicated by something in the parse tree?), we remove or convert | ||
* the TLE to a default. | ||
*/ | ||
bool userSpecifiedValue = CheckIfUserSpecifiedValue(tle, parse); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where is the definition of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll implement it, but I'd like to confirm with you before moving forward with the implementation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It may be indicated by something in the parse tree - comparing the parse tree for an INSERT statement with the identity column explicitly included against the parse tree for an INSERT statement with the identity column implicitly included (per the problem query) would help in determining how. Also, is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @m3hm3t Check if |
||
if ((attr->attidentity == ATTRIBUTE_IDENTITY_ALWAYS || | ||
attr->attidentity == ATTRIBUTE_IDENTITY_BY_DEFAULT) && | ||
!userSpecifiedValue) | ||
{ | ||
/* Skip adding TLE => effectively uses default identity generation */ | ||
continue; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the value of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nevermind - this is not the case so |
||
} | ||
} | ||
|
||
/* If we get here, we keep the TLE. */ | ||
newTargetList = lappend(newTargetList, tle); | ||
} | ||
|
||
/* Update the plan's target list to the "cleaned" version */ | ||
insertSelectQuery->targetList = newTargetList; | ||
|
||
RelationClose(targetRel); | ||
} | ||
|
||
/* get the SELECT query (may have changed after PrepareInsertSelectForCitusPlanner) */ | ||
Query *selectQuery = selectRte->subquery; | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
CREATE SCHEMA issue_7887; | ||
CREATE SCHEMA issue_7887; | ||
|
||
CREATE TABLE local1 ( | ||
id text not null primary key | ||
); | ||
|
||
CREATE TABLE reference1 ( | ||
id int not null primary key, | ||
reference_col1 text not null | ||
); | ||
SELECT create_reference_table('reference1'); | ||
|
||
CREATE TABLE local2 ( | ||
id int not null generated always as identity, | ||
local1fk text not null, | ||
reference1fk int not null, | ||
constraint loc1fk foreign key (local1fk) references local1(id), | ||
constraint reference1fk foreign key (reference1fk) references reference1(id), | ||
constraint testlocpk primary key (id) | ||
); | ||
|
||
INSERT INTO local1(id) VALUES ('aaaaa'), ('bbbbb'), ('ccccc'); | ||
INSERT INTO reference1(id, reference_col1) VALUES (1, 'test'), (2, 'test2'), (3, 'test3'); | ||
|
||
-- The statement that triggers the bug: | ||
INSERT INTO local2(local1fk, reference1fk) | ||
SELECT id, 1 | ||
FROM local1; | ||
|
||
-- If you want to see the error in the regression output, you might do something like: | ||
-- NOTE: The next line is typically how you'd test for an error in a .sql regression test | ||
-- but with a custom "expected" file you'll confirm you get the "invalid string enlargement request size: -4" text | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: can we put test code into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about where such rewriting logic takes place; is it done by Postgres (parse/analyze, standard_planner()) or Citus ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The targetList inside the Query object has a length of 3. It seems to me that the identity column, even if it was automatically generated, originates from PostgreSQL’s core logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it is added by Postgres - and in the case of an identity column, the target expression is the next sequence value. If the user wants to explicitly set an identity column they need to override the default setting:
In this case, the target expression for the identity column is a
VAR
node, referencing the value333
in theSELECT
query, which is similar to how the other targets (local1fk
,reference1fk
) refer to the source data (i.e. the to-be-inserted values).I'm now unsure if removing the implicit identity target is the right way to go, because the following variations of the problem query do successfully complete:
INSERT INTO local2(local1fk, reference1fk) SELECT 'ccccc', 1;
INSERT INTO local2(local1fk, reference1fk) SELECT * FROM (VALUES ('ccccc', 3), ('bbbbb', 2));
In both cases the target for the identity column is the same as the problem query - a call to the next sequence value. So maybe check why these queries can successfully complete the insert - particularly the execution logic, what is the difference between how it receives and handles the data in the problem case and these ok cases ?