Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: velox-sdk part of planbuilder #11479

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

hn5092
Copy link
Contributor

@hn5092 hn5092 commented Nov 8, 2024

  1. add c++ module sdk
  2. c++ and java code impl for sdk

1. add c++ module sdk
2. c++ and java code impl for sdk
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 8, 2024
Copy link

netlify bot commented Nov 8, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 2def1db
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/672ee07629aec40008766704

@hn5092 hn5092 changed the title velox-sdk part 1 WIP: velox-sdk part 1 Nov 8, 2024
@zhztheplayer
Copy link
Contributor

zhztheplayer commented Nov 8, 2024

Thanks!

Just out of my curiosity, If you are using Gluten, could we maintain this API in Gluten project during the period of the early development? I saw some code in the PR is very similar with Gluten CPP's code, I believe we need to redesign some parts of them for being extracted out as common APIs for Velox. And this will require for a comparatively agile development model.

Copy link
Collaborator

@assignUser assignUser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments

CMake/FindJNI.cmake Outdated Show resolved Hide resolved
CMake/kudu_cmake_fns.txt Outdated Show resolved Hide resolved
velox/sdk/CMakeLists.txt Outdated Show resolved Hide resolved
velox/sdk/CMakeLists.txt Outdated Show resolved Hide resolved
velox/sdk/CMakeLists.txt Outdated Show resolved Hide resolved
endif ()

IMPALA_ADD_THIRDPARTY_LIB(java_jvm "${JNI_INCLUDE_DIRS}" "" ${JAVA_JVM_LIBRARY})
include_directories(${JNI_INCLUDE_DIRS})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use targets

velox/sdk/CMakeLists.txt Outdated Show resolved Hide resolved
velox/sdk/CMakeLists.txt Outdated Show resolved Hide resolved
velox/sdk/CMakeLists.txt Outdated Show resolved Hide resolved
velox/sdk/CMakeLists.txt Outdated Show resolved Hide resolved
@assignUser assignUser marked this pull request as draft November 8, 2024 15:16
@hn5092
Copy link
Contributor Author

hn5092 commented Nov 9, 2024

Thanks!

Just out of my curiosity, If you are using Gluten, could we maintain this API in Gluten project during the period of the early development? I saw some code in the PR is very similar with Gluten CPP's code, I believe we need to redesign some parts of them for being extracted out as common APIs for Velox. And this will require for a comparatively agile development model.

@zhztheplayer
We originally used Gluten, but due to the fast pace of iterations, we decided to fully commit to Velox, hence we stopped using Gluten. Our JniCommon and a small part of the Java code (5% or less) did indeed reference Gluten, but the rest of the design is supposed to be different from Gluten, which has an intermediate subtrait for conversion. Our current SDK shouldn't need this conversion, so incorporating it into Gluten seems somewhat inconvenient. Is there any alternative approach for integration? For more details, you can refer to our projects at https://github.com/proxverse/StarryDB/tree/main and https://github.com/proxverse/velox/tree/72a9a6a3329e84773a8bb7555b06bad8e34a6146.

use velox_link_libraries and velox_add_library
@hn5092 hn5092 changed the title WIP: velox-sdk part 1 WIP: velox-sdk part of planbuilder Nov 9, 2024
@Yuhta Yuhta requested a review from pedroerp November 12, 2024 17:16
@pedroerp
Copy link
Contributor

@hn5092 agreed that Velox JNI bindings and Gluten have a bit of an overlap, but are not necessarily the same thing. Velox JNI should follow Velox's external API very closely without any external dependency on projects like Substrait or Arrow, for example.

Perhaps in the future Gluten should use part of these bindings underneath?

Copy link
Contributor

@pedroerp pedroerp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few initial comments. My initial feedback is that this feels a bit too verbose and hard to reason about. It would be nice to have a bit more of a description in the PR summary so we can better understand what is going on.

It would be great if there was a way to divide-and-conquer, like first adding some of the necessary JNI hookups, and only implement a small simple API. Then go from there.

@@ -116,6 +116,7 @@ option(VELOX_ENABLE_PARQUET "Enable Parquet support" OFF)
option(VELOX_ENABLE_ARROW "Enable Arrow support" OFF)
option(VELOX_ENABLE_REMOTE_FUNCTIONS "Enable remote function support" OFF)
option(VELOX_ENABLE_CCACHE "Use ccache if installed." ON)
option(VELOX_ENABLE_SDK "Enable SDK support." ON)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediately clear what we mean by "SDK" here. Maybe "VELOX_ENABLE_JNI_BINDINGS" could be more explicit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use velox/jni as the base dir


if(${VELOX_ENABLE_SDK})
add_subdirectory(sdk)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing new line

@@ -0,0 +1,283 @@
// Copyright 2008 Google Inc. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file seems to be copyrighted by google. Why do we need this?

#define EXPRESSIONUTILS_HPP

#include <boost/type_index.hpp>
#include <exec/WindowFunction.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these velox files but without using the full path? Please always use the full path

}
auto type = field["type"].asString();

// 这个例子没有处理nullable字段,实际使用时你可能需要考虑这个字段
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add comments in English

}

// 将Spark的StructType转换为Velox的Type
static TypePtr convertSparkStructToVelox(const folly::dynamic& sparkStruct) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting this binding layer to closely follow velox's API. Why are there Spark-specific conversion code? Can we get rid of it?

@@ -0,0 +1,61 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the correct license header. In general, please also review our coding style document and ensure your code closely follow it:

https://github.com/facebookincubator/velox/blob/main/CODING_STYLE.md

@zhztheplayer
Copy link
Contributor

zhztheplayer commented Nov 13, 2024

@pedroerp @hn5092

In future Gluten could have chance to migrate to Velox's own plan protocol from Substrait. We have already started working on various relevant refactors in Gluten. The most tricky part is to enhance our query optimizer's relevant capability and we had already done the majority of the work.

Perhaps in the future Gluten should use part of these bindings underneath?

If things go well, Gluten may completely rely on a Velox Java API without Substrait. And it's also no reason to keep the API in Gluten's code base but in the early stage of the development I hope to adopt a co-dev mode on Gluten + Velox Java API, which is likely the most feasible way according to the complexity of Velox + Java usages in Gluten project. We can also explore whether our work could benefit each other.

This is just my 2 cents and based on my assumption that Gluten may be a big potential user for the API (thought Prestissimo doesn't need this?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants