Skip to content

Fix cancel error handling: vendor code, telemetry level, SEA async path#1425

Merged
gopalldb merged 4 commits into
databricks:mainfrom
gopalldb:fix/cancel-error-handling-followup
Apr 27, 2026
Merged

Fix cancel error handling: vendor code, telemetry level, SEA async path#1425
gopalldb merged 4 commits into
databricks:mainfrom
gopalldb:fix/cancel-error-handling-followup

Conversation

@gopalldb
Copy link
Copy Markdown
Collaborator

Summary

Follow-up to #1291 addressing review feedback:

  • F2: getErrorCode() now returns DatabricksDriverErrorCode.EXECUTE_STATEMENT_CANCELLED.ordinal() instead of 0. The 3-arg DatabricksSQLException constructor was silently discarding the error code.
  • F3: Cancellation exceptions use silentExceptions=true so BI-tool cancellations (Tableau, Looker) don't emit ERROR-level telemetry. Cancellations are logged at TRACE level instead.
  • F4: SEA getStatementResult() now detects CANCELED state and throws HY008 — previously the async cancel path returned a DatabricksResultSet wrapping null result data.
  • F5: Added polling-path test (RUNNING → CANCELED during pollTillOperationFinished) for Thrift, which had zero coverage.

Test plan

  • DatabricksSdkClientTest#testHandleFailedExecution_CancelledState_ThrowsWithHY008 — asserts HY008, message, and error code
  • DatabricksSdkClientTest#testHandleFailedExecution_FailedState_ThrowsWithoutHY008 — unchanged
  • DatabricksSdkClientTest#testGetStatementResult_CancelledState_ThrowsWithHY008new (F4)
  • DatabricksThriftAccessorTest#testGetStatementResult_cancelled_throwsWithHY008 — now asserts error code
  • DatabricksThriftAccessorTest#testPollingPath_cancelledDuringExecution_throwsWithHY008new (F5)

NO_CHANGELOG=true

This pull request was AI-assisted by Isaac.

…nc path

- F2: Pass DatabricksDriverErrorCode ordinal as vendor code so
  getErrorCode() returns a meaningful value instead of 0
- F3: Use silentExceptions=true for cancellation exceptions so
  BI-tool cancellations don't emit ERROR-level telemetry
- F4: Add CANCELED state check in SEA getStatementResult() for
  the async cancel path
- F5: Add polling-path test (RUNNING -> CANCELED) for Thrift

Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
boolean silentExceptions) {
super(reason, sqlState, internalError.ordinal());
logTelemetryEvent(sqlState, reason, silentExceptions);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using internalError.ordinal() for the SQL vendor code is fragile and silently changes behavior for non-cancellation callers — High

Two concerns:

1. Silent behavior change for existing callers. This constructor is delegated to from the 3-arg (reason, sqlState, internalError) constructor, which has 3 production call sites today:

  • DatabricksThriftAccessor.java:859-860EXECUTE_STATEMENT_CANCELLED (intended; ordinal 3)
  • DatabricksSdkClient.java:701-704EXECUTE_STATEMENT_CANCELLED (intended; ordinal 3)
  • DatabricksSdkClient.java:722-723EXECUTE_STATEMENT_FAILED (unintended; ordinal 2)

Pre-PR, getErrorCode() for all three was 0. Post-PR, the third site silently starts returning 2 for every real (non-cancel) execution failure. There's no test asserting the pre-PR value of 0, so this is invisible to CI. Customer code that branched on getErrorCode() == 0 ("no code set") will misbehave.

2. ordinal() is not a stable contract. Java's Enum.ordinal() reflects declaration order. If anyone later reorders, alphabetises, or inserts a value into DatabricksDriverErrorCode, every customer's getErrorCode() mapping shifts silently — and the cancel test is not a regression guard for this:

assertEquals(
    DatabricksDriverErrorCode.EXECUTE_STATEMENT_CANCELLED.ordinal(),  // ← live ordinal
    exception.getErrorCode());                                         // ← also live ordinal

Both sides re-evaluate .ordinal() at runtime, so they always agree even if the value silently changed from 3 → 5. The test passes; the customer-facing contract breaks.

(This is the case Joshua Bloch warns about in Effective Java Item 35: "Use instance fields instead of ordinals.")

Also note: CONNECTION_ERROR is currently at ordinal 0. No caller passes it to this constructor today, but if one is added, getErrorCode() will return 0, indistinguishable from JDBC's default "no error code set".

Suggested fix: add an explicit stable integer field to DatabricksDriverErrorCode:

public enum DatabricksDriverErrorCode {
  CONNECTION_ERROR(1001),
  ...
  EXECUTE_STATEMENT_FAILED(1003),
  EXECUTE_STATEMENT_CANCELLED(1008),  // matches HY008 numerically — easy to remember
  ...

  private final int code;
  DatabricksDriverErrorCode(int code) { this.code = code; }
  public int getCode() { return code; }
}

Then super(reason, sqlState, internalError.getCode()). Add a DatabricksSQLExceptionTest that hard-asserts assertEquals(1008, ...) (literal, not .ordinal() / .getCode()) so a reorder or enum change is caught.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

…inal()

Replace ordinal()-based vendor codes with explicit stable integer fields
per Effective Java Item 35. Prevents silent breakage if enum values are
reordered or inserted.

- Add int code field + getCode() to DatabricksDriverErrorCode
- Use getCode() instead of ordinal() in DatabricksSQLException
- Tests assert on literal 1008 (not live ordinal) to catch regressions

Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Copy link
Copy Markdown
Collaborator

@msrathore-db msrathore-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gopalldb gopalldb enabled auto-merge (squash) April 27, 2026 17:48
@gopalldb gopalldb disabled auto-merge April 27, 2026 17:48
@gopalldb gopalldb merged commit 75d13a2 into databricks:main Apr 27, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants