Skip to content

Confusing interface for the write support for Delta IO #13425

@jihoonson

Description

@jihoonson

There are some APIs confusing and potentially bug-prone for the write support for delta IO. Delta33xProvider.convertToGpu() for AppendDataExecV1 is one example. It takes cpuExec.table and cpuExec.write as parameters, which are SupportsWrite and V1Write, respectively. Since those SupportsWrite and V1Write are generic traits that are extended by both CPU and GPU versions, it is unclear whether they are GPU versions or not in the code linked. In fact, they rather look like CPU versions because they are retrieved from the cpuExec. However, they were GPU versions in my testing, at least in the cases I tested. This is confusing, and moreover bug-prone as the interface wouldn't catch it even if we make some mistake and pass in some CPU version.

One simple approach to fix this issue is to introduce a GPU version for those traits in question. This is already done for databricks delta by having GpuDeltaSupportsWrite and GpuDeltaV1Write. Then we can use these new traits to make the confusing APIs take the GPU version as parameters instead of the generic one.

Metadata

Metadata

Assignees

Labels

taskWork required that improves the product but is not user facing

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions