Describe the bug
Arrow Cloud Fetch result streaming fails with a DatabricksParsingException: Invalid state transition when the IdleConnectionEvictor background thread closes the HTTP connection pool while a chunk download is in progress.
The exception surfaces as Premature end of Content-Length delimited message body on the download, followed by an invalid state machine transition attempt (CHUNK_RELEASED -> DOWNLOAD_FAILED) in the retry path, which is a terminal state with no valid outgoing transitions.
This causes the entire result stream to fail non-recoverably even though the driver attempts an internal retry.
To Reproduce
- Use the Databricks JDBC driver with Arrow Cloud Fetch enabled (EnableArrow=1) and
IdleHttpConnectionExpiry not set (driver default applies)
- Execute a query returning a large result set split into multiple Arrow chunks (e.g. a wide table with 100+ float columns, 100k+ rows)
- Apply resource constraints that slow down Arrow chunk downloads (e.g. 0.5 CPU, 1GB RAM container)
- Observe that a chunk download is still in progress when the IdleConnectionEvictor fires (thrift connection idle since Arrow Cloud Fetch resolves URLs upfront and then downloads data via separate connections)
The evictor fires because:
- After the driver receives Arrow chunk URLs from the Databricks warehouse, the thrift connection becomes idle
- The evictor's idle threshold (driver default) expires during the background downloads
connectionManager.closeIdleConnections() closes the shared HTTP pool mid-download
Expected behavior
The driver should either:
- Not close the HTTP pool while Arrow chunk downloads are in progress, or
- Handle the CHUNK_RELEASED -> DOWNLOAD_FAILED transition gracefully -> this is a valid race condition where the consumer thread releases a chunk concurrently with the download thread attempting to mark it failed. The state machine should allow this transition rather than throwing an unrecoverable exception.
Client side logs
Retry attempt 1 for chunk index: 7, Error: Data parsing failed for chunk index [7] and statement [<id>].
Exception [com.databricks.internal.apache.http.ConnectionClosedException:
Premature end of Content-Length delimited message body (expected: 5,439,615; received: 5,439,488)]
Failed to transition to state [DOWNLOAD_FAILED] from state [CHUNK_RELEASED] for chunk [7]
and statement [<id>]. Stack trace:
com.databricks.jdbc.exception.DatabricksParsingException: Invalid state transition for chunk [7]
and statement [<id>]: CHUNK_RELEASED -> DOWNLOAD_FAILED. Valid transitions from CHUNK_RELEASED are: []
at com.databricks.jdbc.api.impl.arrow.ArrowResultChunkStateMachine.transition(ArrowResultChunkStateMachine.java:37)
at com.databricks.jdbc.api.impl.arrow.AbstractArrowResultChunk.setStatus(AbstractArrowResultChunk.java:245)
at com.databricks.jdbc.api.impl.arrow.ArrowResultChunk.handleFailure(ArrowResultChunk.java:157)
at com.databricks.jdbc.api.impl.arrow.ArrowResultChunk.downloadData(ArrowResultChunk.java:130)
at com.databricks.jdbc.api.impl.arrow.ChunkDownloadTask.call(ChunkDownloadTask.java:71)
Root cause (from source analysis):
In ChunkDownloadTask.java, the finally block unconditionally calls chunk.setStatus(ChunkStatus.DOWNLOAD_FAILED) without checking whether the chunk has already been released by the consumer thread.
The CHUNK_RELEASED state has no valid outgoing transitions in ChunkStatus.VALID_TRANSITIONS, making CHUNK_RELEASED -> DOWNLOAD_FAILED an illegal transition even though it is a legitimate race condition in concurrent execution.
The specific race window:
Consumer thread: PROCESSING_SUCCEEDED -> releaseChunk() -> CHUNK_RELEASED
Download thread: ConnectionClosedException -> finally: setStatus(DOWNLOAD_FAILED) <- INVALID
Workaround:
Set IdleHttpConnectionExpiry to a value larger than the expected maximum chunk download time (e.g. 600 seconds) to prevent the evictor from firing during active Arrow downloads:
jdbc:databricks://<host>:443;EnableArrow=1;IdleHttpConnectionExpiry=600
Client Environment (please complete the following information):
- OS: Linux (container, linux/amd64)
- Java version: Java 17.0.18
- Java vendor: OpenJDK (FIPS - 17.0.18-internal with BouncyCastle FIPS)
- Driver Version: 3.3.1 (com.databricks:databricks-jdbc:3.3.1)
- BI Tool: Custom gRPC service (Spring Boot + jOOQ fetchLazy() + Reactor Flux.fromIterable(cursor))
Additional context
- The bug is Arrow Cloud Fetch specific. JSON result format (EnableArrow=0) is not affected as it streams synchronously through the thrift connection with no background download threads and no state machine.
- The
ChunkDownloadTask.finally block should guard the state transition: only attempt setStatus(DOWNLOAD_FAILED) if the chunk is not already in CHUNK_RELEASED state, or alternatively CHUNK_RELEASED -> DOWNLOAD_FAILED should be added as a valid transition in ChunkStatus.VALID_TRANSITIONS.
- The internal retry mechanism (DOWNLOAD_RETRY) is also rendered ineffective by this bug — after the invalid transition attempt fails, the chunk remains in CHUNK_RELEASED with no path to retry.
Describe the bug
Arrow Cloud Fetch result streaming fails with a
DatabricksParsingException: Invalid state transitionwhen theIdleConnectionEvictorbackground thread closes the HTTP connection pool while a chunk download is in progress.The exception surfaces as Premature end of
Content-Lengthdelimited message body on the download, followed by an invalid state machine transition attempt (CHUNK_RELEASED -> DOWNLOAD_FAILED) in the retry path, which is a terminal state with no valid outgoing transitions.This causes the entire result stream to fail non-recoverably even though the driver attempts an internal retry.
To Reproduce
IdleHttpConnectionExpirynot set (driver default applies)The evictor fires because:
connectionManager.closeIdleConnections()closes the shared HTTP pool mid-downloadExpected behavior
The driver should either:
Client side logs
Root cause (from source analysis):
In
ChunkDownloadTask.java, the finally block unconditionally callschunk.setStatus(ChunkStatus.DOWNLOAD_FAILED)without checking whether the chunk has already been released by the consumer thread.The
CHUNK_RELEASEDstate has no valid outgoing transitions inChunkStatus.VALID_TRANSITIONS, makingCHUNK_RELEASED->DOWNLOAD_FAILEDan illegal transition even though it is a legitimate race condition in concurrent execution.The specific race window:
Consumer thread: PROCESSING_SUCCEEDED -> releaseChunk() -> CHUNK_RELEASED
Download thread: ConnectionClosedException -> finally: setStatus(DOWNLOAD_FAILED) <- INVALID
Workaround:
Set
IdleHttpConnectionExpiryto a value larger than the expected maximum chunk download time (e.g. 600 seconds) to prevent the evictor from firing during active Arrow downloads:jdbc:databricks://<host>:443;EnableArrow=1;IdleHttpConnectionExpiry=600Client Environment (please complete the following information):
Additional context
ChunkDownloadTask.finallyblock should guard the state transition: only attemptsetStatus(DOWNLOAD_FAILED)if the chunk is not already in CHUNK_RELEASED state, or alternatively CHUNK_RELEASED -> DOWNLOAD_FAILED should be added as a valid transition in ChunkStatus.VALID_TRANSITIONS.