Skip to content

Conversation

@shaoting-huang
Copy link
Contributor

Add comprehensive support for all Milvus data types with full Arrow conversion:

Scalar Types:

  • Add Int8/ByteType support with TinyIntVector handler
  • Add Timestamptz mapping to LongType
  • Add Text type mapping to StringType

Vector Types:

  • Implement IEEE 754 Float16/BFloat16 to Float32 conversion
  • Add Int8Vector byte[] to short[] conversion
  • Fix BinaryVector type mismatch (ArrayType → BinaryType)

Complex Types:

  • Add JSON Binary to String conversion support
  • Fix Geometry type mapping to BinaryType
  • Implement ArrayOfVector schema mapping and conversion

Testing:

  • Add ArrowConverterTest with 15 comprehensive test cases
  • All scalar, vector, and complex type conversions tested
  • Tests verify null handling and edge cases

Add comprehensive support for all Milvus data types with full Arrow conversion:

Scalar Types:
- Add Int8/ByteType support with TinyIntVector handler
- Add Timestamptz mapping to LongType
- Add Text type mapping to StringType

Vector Types:
- Implement IEEE 754 Float16/BFloat16 to Float32 conversion
- Add Int8Vector byte[] to short[] conversion
- Fix BinaryVector type mismatch (ArrayType → BinaryType)

Complex Types:
- Add JSON Binary to String conversion support
- Fix Geometry type mapping to BinaryType
- Implement ArrayOfVector schema mapping and conversion

Testing:
- Add ArrowConverterTest with 15 comprehensive test cases
- All scalar, vector, and complex type conversions tested
- Tests verify null handling and edge cases

Signed-off-by: shaoting-huang <[email protected]>
@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shaoting-huang
To complete the pull request process, please assign liliu-z after the PR has been reviewed.
You can assign the PR to them by writing /assign @liliu-z in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

## Changes

### 1. Schema-based Type Detection (ArrowConverter.scala)
- Add createFieldTypeMap() to build field name → Milvus DataType lookup
- Update arrowToInternalRow() to accept Optional[CollectionSchema]
- Update arrowValueToSparkValue() and sparkValueToArrowValue() to use schema lookup
- Replace heuristic Float16/BFloat16/FloatVector detection with schema-based logic
- Maintain backward compatibility with Optional parameter and fallback logic

### 2. Fixed Broken Types
- Int8/ByteType: Add TinyIntVector read/write support in ArrowConverter
- BinaryVector: Fix mapping from ArrayType(BinaryType) to ArrayType(ByteType)
- Add FixedSizeBinary special handling for BinaryVector in ArrowConverter

### 3. Added Missing Type Mappings (DataTypeUtil.scala)
- Timestamptz → LongType
- Text → StringType

### 4. Updated Call Sites (MilvusLoonPartitionReader.scala)
- Pass Some(milvusSchema) to all ArrowConverter calls (3 locations)

## Type Coverage
- Complete: 16/18 user-facing types
- Fixed critical bugs: Int8, BinaryVector
- Accurate Float16/BFloat16/FloatVector detection via schema

🤖 Generated with Claude Code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants