-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 7887 Fix insert select planner to exclude identity columns from target list on partial inserts #7911
base: release-13.0
Are you sure you want to change the base?
Conversation
/* | ||
* The insertTargetList are the columns we plan to insert into the target table. | ||
* For partial inserts, it might incorrectly include the identity column if | ||
* some rewriting logic added it. We'll fix that below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about where such rewriting logic takes place; is it done by Postgres (parse/analyze, standard_planner()) or Citus ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/backend/distributed/planner/distributed_planner.c
PlannedStmt *
distributed_planner(Query *parse,
const char *query_string,
int cursorOptions,
ParamListInfo boundParams)
The targetList inside the Query object has a length of 3. It seems to me that the identity column, even if it was automatically generated, originates from PostgreSQL’s core logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it is added by Postgres - and in the case of an identity column, the target expression is the next sequence value. If the user wants to explicitly set an identity column they need to override the default setting:
INSERT INTO local2(id, local1fk, reference1fk) OVERRIDING SYSTEM VALUE SELECT 333, id, 1 FROM local1;
In this case, the target expression for the identity column is a VAR
node, referencing the value 333
in the SELECT
query, which is similar to how the other targets (local1fk
, reference1fk
) refer to the source data (i.e. the to-be-inserted values).
I'm now unsure if removing the implicit identity target is the right way to go, because the following variations of the problem query do successfully complete:
INSERT INTO local2(local1fk, reference1fk) SELECT 'ccccc', 1
INSERT INTO local2(local1fk, reference1fk) SELECT * FROM (VALUES ('ccccc', 3), ('bbbbb', 2))
In both cases the target for the identity column is the same as the problem query - a call to the next sequence value. So maybe check why these queries can successfully complete the insert - particularly the execution logic, what is the difference between how it receives and handles the data in the problem case and these ok cases ?
* indicated by something in the parse tree?), we remove or convert | ||
* the TLE to a default. | ||
*/ | ||
bool userSpecifiedValue = CheckIfUserSpecifiedValue(tle, parse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the definition of CheckIfUserSpecifiedValue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll implement it, but I'd like to confirm with you before moving forward with the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be indicated by something in the parse tree - comparing the parse tree for an INSERT statement with the identity column explicitly included against the parse tree for an INSERT statement with the identity column implicitly included (per the problem query) would help in determining how. Also, is tle->resjunk
true for the identity column when its not explicitly included ? That may be a way to tell if a target should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@m3hm3t Check if INSERT INTO local2(local1fk, reference1fk) SELECT 'ccccc', 1
goes through CreateNonPushableInsertSelectPlan
- it does not seem to hit the issue.
!userSpecifiedValue) | ||
{ | ||
/* Skip adding TLE => effectively uses default identity generation */ | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the value of tle->resjunk
at this point ? Skipping TLE with resjunk = true may be a more general way to deal with the problem. But please check if that is actually the case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind - this is not the case so resjunk
cannot be used to determine if a column was specified by the user.
|
||
-- If you want to see the error in the regression output, you might do something like: | ||
-- NOTE: The next line is typically how you'd test for an error in a .sql regression test | ||
-- but with a custom "expected" file you'll confirm you get the "invalid string enlargement request size: -4" text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we put test code into generated_identity.sql
instead of creating a new test file ? Given that it already tests identity column in Citus,
This PR fixes an issue #7887
A potential approach to handling partial inserts on a distributed table with an identity column is to remove the unused identity column’s
TargetEntry
from the plan’s target list when the user omits it in theINSERT
statement.By eliminating the unnecessary identity column
TargetEntry
(or replacing it with a default), the final execution plan avoids processing that column inAppendCopyRowData
. This prevents issues like the "invalid string enlargement request size: -4" error, which occurs when a misaligned identity column is incorrectly interpreted as text or contains invalid data.RelationIdGetRelation(
targetRelationId)
) to retrieve its tuple descriptor.TargetEntry
ininsertTargetList
.attr->attidentity
) is 'a' or 'd' (ALWAYS
or BYDEFAULT
).insertTargetList
withnewTargetList
that omits the identity TLE.