Fix cancel error handling: vendor code, telemetry level, SEA async path#1425
Conversation
…nc path - F2: Pass DatabricksDriverErrorCode ordinal as vendor code so getErrorCode() returns a meaningful value instead of 0 - F3: Use silentExceptions=true for cancellation exceptions so BI-tool cancellations don't emit ERROR-level telemetry - F4: Add CANCELED state check in SEA getStatementResult() for the async cancel path - F5: Add polling-path test (RUNNING -> CANCELED) for Thrift Co-authored-by: Isaac Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
| boolean silentExceptions) { | ||
| super(reason, sqlState, internalError.ordinal()); | ||
| logTelemetryEvent(sqlState, reason, silentExceptions); | ||
| } |
There was a problem hiding this comment.
Using internalError.ordinal() for the SQL vendor code is fragile and silently changes behavior for non-cancellation callers — High
Two concerns:
1. Silent behavior change for existing callers. This constructor is delegated to from the 3-arg (reason, sqlState, internalError) constructor, which has 3 production call sites today:
DatabricksThriftAccessor.java:859-860—EXECUTE_STATEMENT_CANCELLED(intended; ordinal 3)DatabricksSdkClient.java:701-704—EXECUTE_STATEMENT_CANCELLED(intended; ordinal 3)DatabricksSdkClient.java:722-723—EXECUTE_STATEMENT_FAILED(unintended; ordinal 2)
Pre-PR, getErrorCode() for all three was 0. Post-PR, the third site silently starts returning 2 for every real (non-cancel) execution failure. There's no test asserting the pre-PR value of 0, so this is invisible to CI. Customer code that branched on getErrorCode() == 0 ("no code set") will misbehave.
2. ordinal() is not a stable contract. Java's Enum.ordinal() reflects declaration order. If anyone later reorders, alphabetises, or inserts a value into DatabricksDriverErrorCode, every customer's getErrorCode() mapping shifts silently — and the cancel test is not a regression guard for this:
assertEquals(
DatabricksDriverErrorCode.EXECUTE_STATEMENT_CANCELLED.ordinal(), // ← live ordinal
exception.getErrorCode()); // ← also live ordinalBoth sides re-evaluate .ordinal() at runtime, so they always agree even if the value silently changed from 3 → 5. The test passes; the customer-facing contract breaks.
(This is the case Joshua Bloch warns about in Effective Java Item 35: "Use instance fields instead of ordinals.")
Also note: CONNECTION_ERROR is currently at ordinal 0. No caller passes it to this constructor today, but if one is added, getErrorCode() will return 0, indistinguishable from JDBC's default "no error code set".
Suggested fix: add an explicit stable integer field to DatabricksDriverErrorCode:
public enum DatabricksDriverErrorCode {
CONNECTION_ERROR(1001),
...
EXECUTE_STATEMENT_FAILED(1003),
EXECUTE_STATEMENT_CANCELLED(1008), // matches HY008 numerically — easy to remember
...
private final int code;
DatabricksDriverErrorCode(int code) { this.code = code; }
public int getCode() { return code; }
}Then super(reason, sqlState, internalError.getCode()). Add a DatabricksSQLExceptionTest that hard-asserts assertEquals(1008, ...) (literal, not .ordinal() / .getCode()) so a reorder or enum change is caught.
…inal() Replace ordinal()-based vendor codes with explicit stable integer fields per Effective Java Item 35. Prevents silent breakage if enum values are reordered or inserted. - Add int code field + getCode() to DatabricksDriverErrorCode - Use getCode() instead of ordinal() in DatabricksSQLException - Tests assert on literal 1008 (not live ordinal) to catch regressions Co-authored-by: Isaac Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Summary
Follow-up to #1291 addressing review feedback:
getErrorCode()now returnsDatabricksDriverErrorCode.EXECUTE_STATEMENT_CANCELLED.ordinal()instead of 0. The 3-argDatabricksSQLExceptionconstructor was silently discarding the error code.silentExceptions=trueso BI-tool cancellations (Tableau, Looker) don't emit ERROR-level telemetry. Cancellations are logged at TRACE level instead.getStatementResult()now detects CANCELED state and throwsHY008— previously the async cancel path returned aDatabricksResultSetwrapping null result data.pollTillOperationFinished) for Thrift, which had zero coverage.Test plan
DatabricksSdkClientTest#testHandleFailedExecution_CancelledState_ThrowsWithHY008— asserts HY008, message, and error codeDatabricksSdkClientTest#testHandleFailedExecution_FailedState_ThrowsWithoutHY008— unchangedDatabricksSdkClientTest#testGetStatementResult_CancelledState_ThrowsWithHY008— new (F4)DatabricksThriftAccessorTest#testGetStatementResult_cancelled_throwsWithHY008— now asserts error codeDatabricksThriftAccessorTest#testPollingPath_cancelledDuringExecution_throwsWithHY008— new (F5)NO_CHANGELOG=true
This pull request was AI-assisted by Isaac.