-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: velox-sdk part of planbuilder #11479
base: main
Are you sure you want to change the base?
Conversation
hn5092
commented
Nov 8, 2024
- add c++ module sdk
- c++ and java code impl for sdk
1. add c++ module sdk 2. c++ and java code impl for sdk
✅ Deploy Preview for meta-velox canceled.
|
Thanks! Just out of my curiosity, If you are using Gluten, could we maintain this API in Gluten project during the period of the early development? I saw some code in the PR is very similar with Gluten CPP's code, I believe we need to redesign some parts of them for being extracted out as common APIs for Velox. And this will require for a comparatively agile development model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments
endif () | ||
|
||
IMPALA_ADD_THIRDPARTY_LIB(java_jvm "${JNI_INCLUDE_DIRS}" "" ${JAVA_JVM_LIBRARY}) | ||
include_directories(${JNI_INCLUDE_DIRS}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use targets
@zhztheplayer |
use velox_link_libraries and velox_add_library
@hn5092 agreed that Velox JNI bindings and Gluten have a bit of an overlap, but are not necessarily the same thing. Velox JNI should follow Velox's external API very closely without any external dependency on projects like Substrait or Arrow, for example. Perhaps in the future Gluten should use part of these bindings underneath? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few initial comments. My initial feedback is that this feels a bit too verbose and hard to reason about. It would be nice to have a bit more of a description in the PR summary so we can better understand what is going on.
It would be great if there was a way to divide-and-conquer, like first adding some of the necessary JNI hookups, and only implement a small simple API. Then go from there.
@@ -116,6 +116,7 @@ option(VELOX_ENABLE_PARQUET "Enable Parquet support" OFF) | |||
option(VELOX_ENABLE_ARROW "Enable Arrow support" OFF) | |||
option(VELOX_ENABLE_REMOTE_FUNCTIONS "Enable remote function support" OFF) | |||
option(VELOX_ENABLE_CCACHE "Use ccache if installed." ON) | |||
option(VELOX_ENABLE_SDK "Enable SDK support." ON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not immediately clear what we mean by "SDK" here. Maybe "VELOX_ENABLE_JNI_BINDINGS" could be more explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could use velox/jni
as the base dir
|
||
if(${VELOX_ENABLE_SDK}) | ||
add_subdirectory(sdk) | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing new line
@@ -0,0 +1,283 @@ | |||
// Copyright 2008 Google Inc. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file seems to be copyrighted by google. Why do we need this?
#define EXPRESSIONUTILS_HPP | ||
|
||
#include <boost/type_index.hpp> | ||
#include <exec/WindowFunction.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these velox files but without using the full path? Please always use the full path
} | ||
auto type = field["type"].asString(); | ||
|
||
// 这个例子没有处理nullable字段,实际使用时你可能需要考虑这个字段 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add comments in English
} | ||
|
||
// 将Spark的StructType转换为Velox的Type | ||
static TypePtr convertSparkStructToVelox(const folly::dynamic& sparkStruct) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was expecting this binding layer to closely follow velox's API. Why are there Spark-specific conversion code? Can we get rid of it?
@@ -0,0 +1,61 @@ | |||
// Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use the correct license header. In general, please also review our coding style document and ensure your code closely follow it:
https://github.com/facebookincubator/velox/blob/main/CODING_STYLE.md
In future Gluten could have chance to migrate to Velox's own plan protocol from Substrait. We have already started working on various relevant refactors in Gluten. The most tricky part is to enhance our query optimizer's relevant capability and we had already done the majority of the work.
If things go well, Gluten may completely rely on a Velox Java API without Substrait. And it's also no reason to keep the API in Gluten's code base but in the early stage of the development I hope to adopt a co-dev mode on Gluten + Velox Java API, which is likely the most feasible way according to the complexity of Velox + Java usages in Gluten project. We can also explore whether our work could benefit each other. This is just my 2 cents and based on my assumption that Gluten may be a big potential user for the API (thought Prestissimo doesn't need this?). |