You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the experimental native scans built on DataFusion's ParquetExec and our update to DataFusion 45, we have the opportunity to start adding support for StringView. I have started scoping out this work and would like to start aggregating findings here.
Describe the potential solution
Project-level:
Bump arrow-java version. We're currently on 16.0.0. I believe the view types were added in 17.0.0. I tested bumping to 18.2.0 and so far it doesn't seem too painful.
Java-side:
Add support for decoding Utf8View and BinaryView to CometVector. I prototyped this here and here for Utf8View and BinaryView, respectively.
What is the problem the feature request solves?
With the experimental native scans built on DataFusion's ParquetExec and our update to DataFusion 45, we have the opportunity to start adding support for StringView. I have started scoping out this work and would like to start aggregating findings here.
Describe the potential solution
Project-level:
Java-side:
Utf8View
andBinaryView
toCometVector
. I prototyped this here and here for Utf8View and BinaryView, respectively.Native-side:
I'm sure there's more than this, and will continue adding as I find stuff broken in my proof-of-concept branch.
Additional context
Related DataFusion blogs:
https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-1/
https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-2/
The text was updated successfully, but these errors were encountered: