fix(optimizer): resolve columns of a LATERAL subquery in the outer scope [CLAUDE]#7801
Closed
akiylah-armstead wants to merge 1 commit into
Closed
Conversation
A `LATERAL (<subquery>)` source without explicit alias columns exposed no columns to its enclosing scope, because `UDTF.selects` only returned `alias.columns`. As a result qualify() raised `OptimizeError: Column ... could not be resolved` for the columns of the wrapped query (e.g. the WITH ORDINALITY column produced when transpiling Spark POSEXPLODE to DuckDB). Fall back to the wrapped query's selects when a UDTF has no alias columns. Fixes tobymao#7799
Comment on lines
-108
to
+113
| return alias.columns if alias else [] | ||
| if alias and alias.columns: | ||
| return alias.columns | ||
|
|
||
| # A UDTF without explicit alias columns (e.g. `LATERAL (<subquery>)`) exposes the | ||
| # columns produced by the query it wraps, so fall back to those. | ||
| return super().selects |
Collaborator
There was a problem hiding this comment.
Hey @akiylah-armstead, thank you for the PR but I don't think this is the right layer to solve this problem. I will take this on and ping you in the corresponding PR.
Collaborator
Collaborator
|
This PR wouldn't handle partial column alias lists well, such as in this query: SELECT * FROM t CROSS JOIN LATERAL (SELECT 1 AS a, 2 AS b) AS x(c)Whereas the optimizer already pushes alias names down to the subquery and projects from sqlglot import parse_one
from sqlglot.optimizer.qualify import qualify
sql = "SELECT * FROM t CROSS JOIN LATERAL (SELECT 1 AS a, 2 AS b) AS x(c)"
out = qualify(parse_one(sql, read="duckdb"), dialect="duckdb", schema={"t": {"k": "INT"}}).sql("duckdb")
print(out)
# SELECT "t"."k" AS "k", "x"."c" AS "c" FROM "t" AS "t" CROSS JOIN LATERAL (SELECT 1 AS "a", 2 AS "b") AS "x"("c")
# Note how "b" is dropped from the outputDuckDB includes this column: import duckdb
con = duckdb.connect()
con.execute("CREATE TABLE t AS SELECT 1 AS k")
cur = con.execute("SELECT * FROM t CROSS JOIN LATERAL (SELECT 1 AS a, 2 AS b) AS x(c)")
print([d[0] for d in cur.description], cur.fetchall()) # ['k', 'c', 'b'] [(1, 1, 2)] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #7799.
Bug.
qualify()could not resolve columns produced by aLATERAL (<subquery>)source. This surfaced when transpiling Spark
LATERAL VIEW POSEXPLODEto DuckDB:DuckDB itself resolves the query fine.
Root cause. In the generated SQL the lateral is
Lateral(this=Subquery(...))with no alias columns of its own. The optimizer derives a source's columns from
UDTF.selects, which only returnedalias.columns:Because the
Lateralhas no alias columns,selects(and thereforenamed_selects) was empty, so in the outer scope the lateral source resolved tono columns and
pos/valcould not be qualified. The innerUNNEST ... WITH ORDINALITY AS _t0(val, pos)resolved correctly only becauseUnnest.selectsalready special-cases the ordinality/offset column.
Fix. When a UDTF has no explicit alias columns, fall back to the columns of
the query it wraps (
DerivedTable.selects, i.e. the innerSubquery/Query'sselects). This is general: it works regardless of alias names or column order,
and it leaves the existing aliased path (e.g.
UNNEST(...) AS t(a, b),LATERAL VIEW EXPLODE(...) v AS x) andUnnest.selects(which callssuper().selects) untouched, since those still have alias columns.Tests
tests/test_optimizer.py::test_qualify_columnsusing the exact repro from the issue. It fails before the fix
(
OptimizeError: Column 'pos' could not be resolved) and passes after,producing correctly-qualified SQL.
tests/test_optimizer.py,tests/test_expressions.py,tests/test_build.py,tests/dialects/,tests/test_transpile.py,tests/test_lineage.py), andruff check/ruff format --checkare clean.