Skip to content

Fix SQLInterpolator treating ? in comments and literals as placeholders + shorten next_changelog#1424

Merged
samikshya-db merged 10 commits into
databricks:mainfrom
samikshya-db:fix/issue-1331-sql-interpolator-comments
May 13, 2026
Merged

Fix SQLInterpolator treating ? in comments and literals as placeholders + shorten next_changelog#1424
samikshya-db merged 10 commits into
databricks:mainfrom
samikshya-db:fix/issue-1331-sql-interpolator-comments

Conversation

@samikshya-db
Copy link
Copy Markdown
Collaborator

@samikshya-db samikshya-db commented Apr 27, 2026

Summary

Fixes #1331 + some cosmetic change on next changelog

When supportManyParameters=1, SQLInterpolator.interpolateSQL split the SQL on every ? and counted every occurrence as a placeholder. Question marks inside line/block comments, single-quoted strings, double-quoted identifiers, and backtick-quoted identifiers were all counted, producing spurious "Parameter count does not match" errors for queries such as:

-- does this work?
select 'hello?', * from mytable where id = ?

This had 3 ? characters but only 1 real placeholder, so the driver rejected the perfectly valid binding of 1 parameter.

What changed

  • Added SqlCommentParser.findPlaceholderPositions(sql) which walks the SQL with the existing comment/literal state machine and returns the source indices of ? characters that appear in State.NORMAL only.
  • Reworked SQLInterpolator.interpolateSQL to use those positions: a single pass that slices the original SQL between placeholders and substitutes formatted parameter values. Comments, string literals, and quoted identifiers are preserved verbatim in the output.
  • Removed the now-redundant countPlaceholders helper that scanned every ?.

DatabricksParameterMetaData.countParameters already used SqlCommentParser.forEachNonCommentChar and is unaffected.

Test plan

  • Existing SQLInterpolatorTest cases (basic interpolation, mixed types, escaped quotes, mismatch errors, etc.) all pass.
  • Added unit tests for SQLInterpolator covering ? inside line comments, block comments, single-quoted strings, double-quoted identifiers, backtick identifiers, and a combined case.
  • Added unit tests for SqlCommentParser.findPlaceholderPositions covering null/empty input, basic positions, comments, literals, quoted identifiers, and escaped quotes.
  • mvn spotless:apply clean.

samikshya-db and others added 2 commits April 27, 2026 15:36
When supportManyParameters=1, SQLInterpolator split the SQL on every '?',
so question marks inside line/block comments, string literals, and quoted
identifiers were counted as parameter placeholders. This caused
"Parameter count does not match" errors for queries like:

  -- does this work?
  select 'hello?', * from mytable where id = ?

Add SqlCommentParser.findPlaceholderPositions to locate only real '?'
markers (state == NORMAL) and use it from SQLInterpolator. Fixes databricks#1331.

Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
- Add IndexedSqlCharConsumer overload to forEachNonCommentChar so callers
  can recover the source-string index of each emitted character. Refactor
  findPlaceholderPositions to delegate to it, removing ~70 lines of
  duplicated state-machine logic.
- Apply the same comment/literal-aware fix to surroundPlaceholdersWithQuotes,
  which previously used a regex that ignored comments, double-quoted
  identifiers, and backtick identifiers. A '?' inside any of those is now
  preserved instead of being wrapped in single quotes.
- Drop the now-unused Matcher/Pattern imports from SQLInterpolator.
- Add tests for CRLF line comments, nested block comments containing '?',
  adjacent placeholders, leading/trailing placeholders, escaped backticks,
  and the new surroundPlaceholdersWithQuotes cases.

Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
@samikshya-db samikshya-db changed the title Fix SQLInterpolator treating ? in comments and literals as placeholders Fix SQLInterpolator treating ? in comments and literals as placeholders + shorten next_changelog May 12, 2026
if (blockCommentDepth == 0) {
state = State.NORMAL;
consumer.accept(state, ' ');
consumer.accept(state, ' ', i);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Databricks SQL supports $$...$$ and $tag$...$tag$ dollar-quoted strings. The state machine doesn't track these. A ? inside $$...?...$$ would still be counted as a placeholder.

Looking at the code, the state machine only knows about ', ", and `. Dollar-quoting isn't supported.

Impact: If a user writes select $$hello?$$, ? from t with 1 param, the driver counts 2 placeholders → "Parameter count does not match".

Severity: Low — dollar-quoted strings aren't common in Databricks SQL. But the changelog and JavaDoc say "string literals" without qualification, which is misleading. Worth noting as a limitation, or adding
$$ support.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, but I will just keep this out of scope for now.

// --- findPlaceholderPositions ---

@Test
public void testFindPlaceholderPositionsNullAndEmpty() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing tests:

🟡 #13: No test for unclosed quote / comment (Corner Case #10).
🟡 #14: No test for surroundPlaceholdersWithQuotes with adjacent placeholders like select ?,? from t.
🟡 #15: No test for SQL containing only comments (-- foo\n with 0 params).
🟡 #16: No test for mixed escape inside identifier ( `col? ``) — the escape happens between ? characters.
🟡 #17: No test for interpolateSQL with extra parameters (params.size() > placeholders) — does it throw the right error?
🟡 #18: No test for very large SQL (perf regression check).
🟡 #19: ? immediately after -- start (--?). The -- enters IN_LINE_COMMENT; subsequent chars including ? are consumed in IN_LINE_COMMENT state. Should be skipped. Not tested directly.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — added the ones I think are worth tracking:

Skipped:

Copy link
Copy Markdown
Collaborator

@gopalldb gopalldb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments, thanks for fixing this

samikshya-db and others added 2 commits May 14, 2026 02:24
Adds tests requested in databricks#1424 review:
- unclosed single/double/backtick quote and unclosed block comment
  in findPlaceholderPositions
- comments-only SQL (line-only, block-only, mixed)
- `--?` immediately after line-comment start
- backtick identifier with escape between ? characters (`a?``b?`)
- adjacent placeholders for surroundPlaceholdersWithQuotes

Co-authored-by: Isaac
Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
@samikshya-db samikshya-db enabled auto-merge (squash) May 13, 2026 21:55
@samikshya-db samikshya-db merged commit f91aacc into databricks:main May 13, 2026
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Question marks in comments and string literals interpreted as parameters with supportManyParameters

2 participants