[SEA] Fix CloudFetch path to wrap ARRAY and MAP columns as JSON strings (PECO-3016)#1440
Closed
eric-wang-1990 wants to merge 1 commit into
Closed
[SEA] Fix CloudFetch path to wrap ARRAY and MAP columns as JSON strings (PECO-3016)#1440eric-wang-1990 wants to merge 1 commit into
eric-wang-1990 wants to merge 1 commit into
Conversation
…gs (PECO-3016) On the CloudFetch path, the SEA manifest reports ARRAY/MAP/STRUCT columns with a STRING wire type. The driver's getObjectWithComplexTypeHandling() only checked requiredType (from the manifest), so it never triggered complex-type handling for those columns. Fix: before the complex-type branch, derive an effectiveType from the Arrow schema metadata (ARROW_METADATA_KEY = "Spark:DataType:SqlName") embedded in the CloudFetch IPC file. When requiredType is not already a complex type but arrowMetadata identifies the column as ARRAY/MAP/STRUCT, use the Arrow metadata to set effectiveType appropriately. The rest of the method then uses effectiveType so both the JSON-string path (isComplexDatatypeSupportEnabled=false) and the DatabricksArray/DatabricksMap path (isComplexDatatypeSupportEnabled=true) work correctly on CloudFetch. Added unit tests covering: - ARRAY column (arrowMetadata="ARRAY<INT>"), complex support disabled → JSON String - ARRAY column (arrowMetadata="ARRAY<STRING>"), complex support enabled → DatabricksArray - MAP column (arrowMetadata="MAP<STRING,INT>"), complex support disabled → JSON String - ARRAY column via requiredType=ARRAY (existing path), complex support disabled → JSON String Signed-off-by: Eric Wang <e.wang@databricks.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes complex-type handling on the SEA CloudFetch path by using the Arrow IPC schema metadata (e.g., Spark:DataType:SqlName) to detect ARRAY/MAP/STRUCT types when the SEA manifest reports a STRING wire type, ensuring values are returned as JSON strings when complex support is disabled and as DatabricksArray/DatabricksMap when enabled.
Changes:
- Derives an
effectiveTypefromarrowMetadatainArrowStreamResult.getObjectWithComplexTypeHandling()and uses it for subsequent branching. - Adds unit tests covering CloudFetch complex-type wrapping behavior for
ARRAY/MAP, including both complex-support enabled/disabled scenarios. - Updates
NEXT_CHANGELOG.mdwith a “Fixed” entry describing the CloudFetch complex-type handling fix.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/main/java/com/databricks/jdbc/api/impl/arrow/ArrowStreamResult.java | Uses Arrow schema metadata to correctly identify complex types on CloudFetch and apply the right conversion path. |
| src/test/java/com/databricks/jdbc/api/impl/arrow/ArrowStreamResultTest.java | Adds targeted unit tests to validate CloudFetch complex-type wrapping behavior. |
| NEXT_CHANGELOG.md | Documents the fix in the upcoming changelog. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // When complex type support is enabled, the converter should get the raw value as ARRAY | ||
| when(mockIterator.getColumnObjectAtCurrentRow( | ||
| eq(0), eq(ColumnInfoTypeName.ARRAY), eq(arrowMetadata), eq(columnInfo))) | ||
| .thenReturn(new DatabricksArray(java.util.Arrays.asList("a", "b"))); |
Comment on lines
+776
to
+777
| // Should return a formatted string representation, not a DatabricksMap | ||
| assertInstanceOf(String.class, result); |
Contributor
Author
|
Closing — this was created in the wrong repo (JDBC instead of ADBC). The fix belongs in the ADBC C# driver. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On the CloudFetch path (SEA mode), ARRAY and MAP columns were not being wrapped as JSON strings and not being returned as
DatabricksArray/DatabricksMapobjects.Root Cause
The
getObjectWithComplexTypeHandling()method inArrowStreamResultdetermines how to handle a column by inspectingrequiredType(sourced from the column metadata in the SEA manifest). On the CloudFetch path, the SEA manifest reports ARRAY/MAP/STRUCT columns with aSTRINGwire type — because CloudFetch transmits these as Arrow UTF-8 strings in the IPC file. As a result,isComplexType(requiredType)returnsfalseand the complex-type handling branch is never entered.However, the Arrow IPC file embedded in each CloudFetch chunk carries richer metadata: the
"Spark:DataType:SqlName"field metadata key (ARROW_METADATA_KEY) is set to the true SQL type, e.g."ARRAY<INT>"or"MAP<STRING,INT>". ThisarrowMetadatastring was already being extracted and passed into the method, but was only used after the complex-type check — too late.Fix
Before the complex-type branch, derive an
effectiveTypefromarrowMetadatausingDatabricksTypeUtil.isComplexType(String). WhenrequiredTypeis not already a complex type butarrowMetadataidentifies the column asARRAY/MAP/STRUCT, overrideeffectiveTypeaccordingly. All subsequent branching useseffectiveTypeinstead ofrequiredType, so both:EnableComplexDatatypeSupport=false) andDatabricksArray/DatabricksMappath (EnableComplexDatatypeSupport=true)work correctly for CloudFetch ARRAY/MAP/STRUCT columns.
Changes
ArrowStreamResult.java: AddedeffectiveTypederivation logic ingetObjectWithComplexTypeHandling(); replaced all uses ofrequiredTypewitheffectiveTypein the complex-type handling branches. Added import forDatabricksTypeUtil.ArrowStreamResultTest.java: Added 4 unit tests covering:arrowMetadata="ARRAY<INT>"), complex support disabled → returns JSON StringarrowMetadata="ARRAY<STRING>"), complex support enabled → returnsDatabricksArrayarrowMetadata="MAP<STRING,INT>"), complex support disabled → returns JSON StringrequiredType=ARRAY(existing path), complex support disabled → returns JSON StringNEXT_CHANGELOG.md: Added entry under### Fixed.Test Plan
ArrowStreamResultTestpassArrowStreamResultTesttests passDatabricksArray/DatabricksMap(complex support enabled)Fixes: PECO-3016