-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIGSEGV on simple query with LEFT OUTER JOIN #7787
Comments
Hi Matteo, can you share the schema of the tables, the CREATE TABLE statements (relevant parts) and how the tables are distributed ? This will help us in reproducing the issue. If the last count in select (COUNT(DISTINCT "public"."table3"."col1") AS "d748e8927346013d185155c00172b7f7") is commented out, SIGSEGV does not occurs So the issue does not occur if the query is changed to:
? Thanks! |
Hi @colm-mchugh, I'm able to reproduce the same error with the following instructions: CREATE TABLE public.table1 ( col1 character varying(70), col2 text, col3 text, col4 text, col5 text, k_client text ); CREATE TABLE public.table3 ( col1 character varying(200), col2 text ); CREATE TABLE public.table4 ( col1 text ); CREATE TABLE public.table2 ( col1 numeric, k_client text, col3 text, col4 text, col5 date ); CREATE INDEX ON public.table1 USING btree (k_client); CREATE INDEX ON public.table3 USING btree (col2); CREATE INDEX ON public.table4 USING btree (col1); CREATE INDEX ON public.table2 USING btree (k_client); CREATE INDEX ON public.table2 USING btree (col3); CREATE INDEX ON public.table2 USING btree (col4); SELECT create_distributed_table('table1', 'k_client', shard_count => 8); SELECT create_distributed_table('table2', 'k_client', colocate_with => 'table1'); SELECT create_reference_table('table3'); SELECT create_reference_table('table4'); I slightly modified the query from the first message to distribute the tables on the k_client column as in the real use case in production. SELECT "public"."table1"."col1" AS "42e84ecab692a85e748bdf7ff86a474e", "public"."table1"."col2" AS "6a576f53b68ac363812eaf95205439aa", "public"."table1"."col3" AS "695e025e83bd013e2bda04762fc753e5", "public"."table1"."col4" AS "b1d24f4d668be03b983cf43621068584", "public"."table1"."col5" AS "771c5eaa7ed151a8de99a794e8931843", SUM("public"."table2"."col1") AS "0888d71bf3ea38d31bbc95cade8f3b90", COUNT(DISTINCT "public"."table3"."col1") AS "d748e8927346013d185155c00172b7f7" FROM "public"."table2" LEFT OUTER JOIN "public"."table1" ON "public"."table2"."k_client" = "public"."table1"."k_client" LEFT OUTER JOIN "public"."table3" ON "public"."table2"."col3" = "public"."table3"."col2" LEFT OUTER JOIN "public"."table4" ON "public"."table2"."col4" = "public"."table4"."col1" WHERE "public"."table2"."col5" BETWEEN '2024-10-01' AND '2024-10-31' GROUP BY "public"."table1"."col1", "public"."table1"."col2", "public"."table1"."col3", "public"."table1"."col4", "public"."table1"."col5" ORDER BY "0888d71bf3ea38d31bbc95cade8f3b90" DESC NULLS LAST, "42e84ecab692a85e748bdf7ff86a474e" ASC NULLS LAST, "6a576f53b68ac363812eaf95205439aa" ASC NULLS LAST, "695e025e83bd013e2bda04762fc753e5" ASC NULLS LAST, "b1d24f4d668be03b983cf43621068584" ASC NULLS LAST, "771c5eaa7ed151a8de99a794e8931843" ASC NULLS LAST LIMIT 1001; I confirm that running the query as: SELECT "public"."table1"."col1" AS "42e84ecab692a85e748bdf7ff86a474e", "public"."table1"."col2" AS "6a576f53b68ac363812eaf95205439aa", "public"."table1"."col3" AS "695e025e83bd013e2bda04762fc753e5", "public"."table1"."col4" AS "b1d24f4d668be03b983cf43621068584", "public"."table1"."col5" AS "771c5eaa7ed151a8de99a794e8931843", SUM("public"."table2"."col1") AS "0888d71bf3ea38d31bbc95cade8f3b90" -- COUNT(DISTINCT "public"."table3"."col1") AS "d748e8927346013d185155c00172b7f7" FROM "public"."table2" LEFT OUTER JOIN "public"."table1" ON "public"."table2"."k_client" = "public"."table1"."k_client" LEFT OUTER JOIN "public"."table3" ON "public"."table2"."col3" = "public"."table3"."col2" LEFT OUTER JOIN "public"."table4" ON "public"."table2"."col4" = "public"."table4"."col1" WHERE "public"."table2"."col5" BETWEEN '2024-10-01' AND '2024-10-31' GROUP BY "public"."table1"."col1", "public"."table1"."col2", "public"."table1"."col3", "public"."table1"."col4", "public"."table1"."col5" ORDER BY "0888d71bf3ea38d31bbc95cade8f3b90" DESC NULLS LAST, "42e84ecab692a85e748bdf7ff86a474e" ASC NULLS LAST, "6a576f53b68ac363812eaf95205439aa" ASC NULLS LAST, "695e025e83bd013e2bda04762fc753e5" ASC NULLS LAST, "b1d24f4d668be03b983cf43621068584" ASC NULLS LAST, "771c5eaa7ed151a8de99a794e8931843" ASC NULLS LAST LIMIT 1001; with "COUNT" commented out, runs well without raise SIGSEGV. The same error occurs with both 1 worker node and multiple worker nodes. Thanks, |
Thanks @mbona92 I can confirm that we can reproduce the issue. It is related to #7705; when building a worker subquery the VAR node for the column in the COUNT DISTINCT expression has a non-empty varnullingrels field, this should be empty for the corresponding VAR in the combine query but it is just copied over. We will investigate further and fix. Thanks again for bringing to our attention. |
Fix the SEGV seen in #7787; it occurs because a column in the targetlist of a worker subquery can contain a non-empty varnullingrels field if the column is from the inner side of a left outer join. The issue can also occur with the columns in the HAVING clause, and this is also tested in the fix. The issue was triggered by the introduction of the varnullingrels to Vars in Postgres 16 (2489d76c) the query tree for the combine query. Here, the issue occurs when creating a worker subquery. The regress file from #7705 is used (and renamed) to also test this (#7787). An alternative test output file is required for Postgres 15 because of an optimization to DISTINCT in Postgres 16 (1349d2790bf)
Fix the SEGV seen in #7787; it occurs because a column in the targetlist of a worker subquery can contain a non-empty varnullingrels field if the column is from the inner side of a left outer join. The issue can also occur with the columns in the HAVING clause, and this is also tested in the fix. The issue was triggered by the introduction of the varnullingrels to Vars in Postgres 16 (2489d76c) There is a related issue, #7705, where a non-empty varnullingrels was incorrectly copied into the query tree for the combine query. Here, a non-empty varnullingrels field of a var is incorrectly copied into the query tree for a worker subquery. The regress file from #7705 is used (and renamed) to also test this (#7787). An alternative test output file is required for Postgres 15 because of an optimization to DISTINCT in Postgres 16 (1349d2790bf).
Fix the SEGV seen in #7787; it occurs because a column in the targetlist of a worker subquery can contain a non-empty varnullingrels field if the column is from the inner side of a left outer join. The issue can also occur with the columns in the HAVING clause, and this is also tested in the fix. The issue was triggered by the introduction of the varnullingrels to Vars in Postgres 16 (2489d76c) There is a related issue, #7705, where a non-empty varnullingrels was incorrectly copied into the query tree for the combine query. Here, a non-empty varnullingrels field of a var is incorrectly copied into the query tree for a worker subquery. The regress file from #7705 is used (and renamed) to also test this (#7787). An alternative test output file is required for Postgres 15 because of an optimization to DISTINCT in Postgres 16 (1349d2790bf).
Fix the SEGV seen in #7787; it occurs because a column in the targetlist of a worker subquery can contain a non-empty varnullingrels field if the column is from the inner side of a left outer join. The issue can also occur with the columns in the HAVING clause, and this is also tested in the fix. The issue was triggered by the introduction of the varnullingrels to Vars in Postgres 16 (2489d76c) There is a related issue, #7705, where a non-empty varnullingrels was incorrectly copied into the query tree for the combine query. Here, a non-empty varnullingrels field of a var is incorrectly copied into the query tree for a worker subquery. The regress file from #7705 is used (and renamed) to also test this (#7787). An alternative test output file is required for Postgres 15 because of an optimization to DISTINCT in Postgres 16 (1349d2790bf).
DESCRIPTION: Fixes a crash in left outer joins that can happen when there is an an aggregate on a column from the inner side of the join. Fix the SEGV seen in #7787 and #7899; it occurs because a column in the targetlist of a worker subquery can contain a non-empty varnullingrels field if the column is from the inner side of a left outer join. The issue can also occur with the columns in the HAVING clause, and this is also tested in the fix. The issue was triggered by the introduction of the varnullingrels to Vars in Postgres 16 (2489d76c) There is a related issue, #7705, where a non-empty varnullingrels was incorrectly copied into the query tree for the combine query. Here, a non-empty varnullingrels field of a var is incorrectly copied into the query tree for a worker subquery. The regress file from #7705 is used (and renamed) to also test this (#7787). An alternative test output file is required for Postgres 15 because of an optimization to DISTINCT in Postgres 16 (1349d2790bf).
Fixed by PR #7901 |
DESCRIPTION: Fixes a crash in left outer joins that can happen when there is an an aggregate on a column from the inner side of the join. Fix the SEGV seen in #7787 and #7899; it occurs because a column in the targetlist of a worker subquery can contain a non-empty varnullingrels field if the column is from the inner side of a left outer join. The issue can also occur with the columns in the HAVING clause, and this is also tested in the fix. The issue was triggered by the introduction of the varnullingrels to Vars in Postgres 16 (2489d76c) There is a related issue, #7705, where a non-empty varnullingrels was incorrectly copied into the query tree for the combine query. Here, a non-empty varnullingrels field of a var is incorrectly copied into the query tree for a worker subquery. The regress file from #7705 is used (and renamed) to also test this (#7787). An alternative test output file is required for Postgres 15 because of an optimization to DISTINCT in Postgres 16 (1349d2790bf).
DESCRIPTION: Fixes a crash in left outer joins that can happen when there is an an aggregate on a column from the inner side of the join. Fix the SEGV seen in #7787 and #7899; it occurs because a column in the targetlist of a worker subquery can contain a non-empty varnullingrels field if the column is from the inner side of a left outer join. The issue can also occur with the columns in the HAVING clause, and this is also tested in the fix. The issue was triggered by the introduction of the varnullingrels to Vars in Postgres 16 (2489d76c) There is a related issue, #7705, where a non-empty varnullingrels was incorrectly copied into the query tree for the combine query. Here, a non-empty varnullingrels field of a var is incorrectly copied into the query tree for a worker subquery. The regress file from #7705 is used (and renamed) to also test this (#7787). An alternative test output file is required for Postgres 15 because of an optimization to DISTINCT in Postgres 16 (1349d2790bf).
Hi,
we are facing a segmentation fault executing one specific query.
The issue seems similar to #7705 but our query does not contains window partition.
Latest version of Postgres 16 server and citus are installed and issue appears in the same way using citus from rhel repo and from latest github release.
Query that always causes SIGSEGV is:
Backtrace from gdb:
If the last count in select (COUNT(DISTINCT "public"."table3"."col1") AS "d748e8927346013d185155c00172b7f7") is commented out, SIGSEGV does not occurs.
Furthermore, if query changes from LEFT OUTER JOIN to INNER JOIN, issue does not occurs:
Could you help us solving this?
Ask me if you need more info.
Thanks,
Matteo
The text was updated successfully, but these errors were encountered: