Skip to content

[SEA] Fix CloudFetch path to wrap ARRAY and MAP columns as JSON strings (PECO-3016)#1440

Closed
eric-wang-1990 wants to merge 1 commit into
mainfrom
worktree-agent-a355298fdfa02b222
Closed

[SEA] Fix CloudFetch path to wrap ARRAY and MAP columns as JSON strings (PECO-3016)#1440
eric-wang-1990 wants to merge 1 commit into
mainfrom
worktree-agent-a355298fdfa02b222

Conversation

@eric-wang-1990
Copy link
Copy Markdown
Contributor

Problem

On the CloudFetch path (SEA mode), ARRAY and MAP columns were not being wrapped as JSON strings and not being returned as DatabricksArray/DatabricksMap objects.

Root Cause

The getObjectWithComplexTypeHandling() method in ArrowStreamResult determines how to handle a column by inspecting requiredType (sourced from the column metadata in the SEA manifest). On the CloudFetch path, the SEA manifest reports ARRAY/MAP/STRUCT columns with a STRING wire type — because CloudFetch transmits these as Arrow UTF-8 strings in the IPC file. As a result, isComplexType(requiredType) returns false and the complex-type handling branch is never entered.

However, the Arrow IPC file embedded in each CloudFetch chunk carries richer metadata: the "Spark:DataType:SqlName" field metadata key (ARROW_METADATA_KEY) is set to the true SQL type, e.g. "ARRAY<INT>" or "MAP<STRING,INT>". This arrowMetadata string was already being extracted and passed into the method, but was only used after the complex-type check — too late.

Fix

Before the complex-type branch, derive an effectiveType from arrowMetadata using DatabricksTypeUtil.isComplexType(String). When requiredType is not already a complex type but arrowMetadata identifies the column as ARRAY/MAP/STRUCT, override effectiveType accordingly. All subsequent branching uses effectiveType instead of requiredType, so both:

  • the JSON-string path (EnableComplexDatatypeSupport=false) and
  • the DatabricksArray/DatabricksMap path (EnableComplexDatatypeSupport=true)

work correctly for CloudFetch ARRAY/MAP/STRUCT columns.

Changes

  • ArrowStreamResult.java: Added effectiveType derivation logic in getObjectWithComplexTypeHandling(); replaced all uses of requiredType with effectiveType in the complex-type handling branches. Added import for DatabricksTypeUtil.
  • ArrowStreamResultTest.java: Added 4 unit tests covering:
    • ARRAY column (arrowMetadata="ARRAY<INT>"), complex support disabled → returns JSON String
    • ARRAY column (arrowMetadata="ARRAY<STRING>"), complex support enabled → returns DatabricksArray
    • MAP column (arrowMetadata="MAP<STRING,INT>"), complex support disabled → returns JSON String
    • ARRAY column via requiredType=ARRAY (existing path), complex support disabled → returns JSON String
  • NEXT_CHANGELOG.md: Added entry under ### Fixed.

Test Plan

  • New unit tests in ArrowStreamResultTest pass
  • Existing ArrowStreamResultTest tests pass
  • Manual verification: connect to a Databricks workspace via SEA with CloudFetch enabled, query a table with ARRAY/MAP columns, and verify they are returned as JSON strings (complex support disabled) or DatabricksArray/DatabricksMap (complex support enabled)

Fixes: PECO-3016

…gs (PECO-3016)

On the CloudFetch path, the SEA manifest reports ARRAY/MAP/STRUCT columns
with a STRING wire type. The driver's getObjectWithComplexTypeHandling()
only checked requiredType (from the manifest), so it never triggered
complex-type handling for those columns.

Fix: before the complex-type branch, derive an effectiveType from the
Arrow schema metadata (ARROW_METADATA_KEY = "Spark:DataType:SqlName")
embedded in the CloudFetch IPC file. When requiredType is not already a
complex type but arrowMetadata identifies the column as ARRAY/MAP/STRUCT,
use the Arrow metadata to set effectiveType appropriately. The rest of the
method then uses effectiveType so both the JSON-string path
(isComplexDatatypeSupportEnabled=false) and the DatabricksArray/DatabricksMap
path (isComplexDatatypeSupportEnabled=true) work correctly on CloudFetch.

Added unit tests covering:
  - ARRAY column (arrowMetadata="ARRAY<INT>"), complex support disabled → JSON String
  - ARRAY column (arrowMetadata="ARRAY<STRING>"), complex support enabled → DatabricksArray
  - MAP column (arrowMetadata="MAP<STRING,INT>"), complex support disabled → JSON String
  - ARRAY column via requiredType=ARRAY (existing path), complex support disabled → JSON String

Signed-off-by: Eric Wang <e.wang@databricks.com>
Copilot AI review requested due to automatic review settings May 5, 2026 20:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes complex-type handling on the SEA CloudFetch path by using the Arrow IPC schema metadata (e.g., Spark:DataType:SqlName) to detect ARRAY/MAP/STRUCT types when the SEA manifest reports a STRING wire type, ensuring values are returned as JSON strings when complex support is disabled and as DatabricksArray/DatabricksMap when enabled.

Changes:

  • Derives an effectiveType from arrowMetadata in ArrowStreamResult.getObjectWithComplexTypeHandling() and uses it for subsequent branching.
  • Adds unit tests covering CloudFetch complex-type wrapping behavior for ARRAY/MAP, including both complex-support enabled/disabled scenarios.
  • Updates NEXT_CHANGELOG.md with a “Fixed” entry describing the CloudFetch complex-type handling fix.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/main/java/com/databricks/jdbc/api/impl/arrow/ArrowStreamResult.java Uses Arrow schema metadata to correctly identify complex types on CloudFetch and apply the right conversion path.
src/test/java/com/databricks/jdbc/api/impl/arrow/ArrowStreamResultTest.java Adds targeted unit tests to validate CloudFetch complex-type wrapping behavior.
NEXT_CHANGELOG.md Documents the fix in the upcoming changelog.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// When complex type support is enabled, the converter should get the raw value as ARRAY
when(mockIterator.getColumnObjectAtCurrentRow(
eq(0), eq(ColumnInfoTypeName.ARRAY), eq(arrowMetadata), eq(columnInfo)))
.thenReturn(new DatabricksArray(java.util.Arrays.asList("a", "b")));
Comment on lines +776 to +777
// Should return a formatted string representation, not a DatabricksMap
assertInstanceOf(String.class, result);
@eric-wang-1990
Copy link
Copy Markdown
Contributor Author

Closing — this was created in the wrong repo (JDBC instead of ADBC). The fix belongs in the ADBC C# driver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants