Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make use of extendedPlugins provide runtime dependencies for arrow-me… #17738

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

rishabhmaurya
Copy link
Contributor

…mrory-core

Description

Currently Arrow Flight Stream based SPIs introduced in #16691 make use of generics. Since plugins have a separate classloaders, so its not possible to share BufferAllocators and VectorSchemaRoots from arrow-memory to be shared between plugins thus making SPIs not usable.

Related Issues

Resolves #17671

Check List

  • Functionality includes testing.
    - [ ] API changes companion pull request created, if applicable.
    - [ ] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@rishabhmaurya
Copy link
Contributor Author

@reta please take a look at this attempt.
I'm still running into one security permission related issue for which i have to add permission java.net.SocketPermission "*", "accept,connect,listen,resolve"; to security.policy as I'm not able to get away with it by adding it into plugin's security.policy.

@rishabhmaurya rishabhmaurya force-pushed the arrow-extended-plugin branch 3 times, most recently from 80e7b52 to 97fd86b Compare March 31, 2025 19:07
Copy link
Contributor

❌ Gradle check result for 97fd86b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@rishabhmaurya rishabhmaurya force-pushed the arrow-extended-plugin branch from 97fd86b to e20d0a1 Compare March 31, 2025 20:00
Copy link
Contributor

✅ Gradle check result for e20d0a1: SUCCESS

Copy link

codecov bot commented Mar 31, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.41%. Comparing base (7b6108b) to head (ba3bc5d).
Report is 10 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17738      +/-   ##
============================================
- Coverage     72.44%   72.41%   -0.04%     
- Complexity    66483    66484       +1     
============================================
  Files          5409     5409              
  Lines        308282   308258      -24     
  Branches      44759    44749      -10     
============================================
- Hits         223344   223218     -126     
- Misses        66608    66741     +133     
+ Partials      18330    18299      -31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@reta
Copy link
Collaborator

reta commented Mar 31, 2025

I'm still running into one security permission related issue for which i have to add permission java.net.SocketPermission "*", "accept,connect,listen,resolve"; to security.policy as I'm not able to get away with it by adding it into plugin's security.policy.

@rishabhmaurya I will surely take a look, thanks @rishabhmaurya

@reta
Copy link
Collaborator

reta commented Mar 31, 2025

@rishabhmaurya I think this is certainly the work in right direction, but we need the following pieces:

  • the arrow plugin should have an SPI module (aka client)
  • the pugin's SPI module will be imported by other plugins
  • along with extendedPlugins = ['opensearch-arrow-flight'] plugin manifest entry

Please correct if I am wrong here. Examples to look at: https://github.com/opensearch-project/geospatial/ , thank you

@rishabhmaurya
Copy link
Contributor Author

rishabhmaurya commented Mar 31, 2025

@reta that's a valid point.
Currently, all the SPIs we have defined are present in https://github.com/opensearch-project/OpenSearch/tree/main/libs/arrow-spi/src/main/java/org/opensearch/arrow/spi
We share the StreamManager instance from server to other plugins using StreamManagerPlugin.
Using StreamManager, one can add a StreamProducer and consume it using StreamReader, so we don't have any requirements of implementing any SPIs from opensearch-arrow-core plugin, but use it only to get runtime dependencies.

I think going forward, when we will plan to start supporting Enable transport request response to be returned in Arrow format
, then we might need SPIs.

The new SPI then could allow one to implement action against a TransportRequest with response in ArrowFormat. I will add more details on it this week.

I'm trying to decouple the arrow-vector use and arrow flight transport use cases.
arrow-vector use cases doesn't need any SPI to be implemented and that's the purpose of plugin opensearch-arrow-core.
whereas, opensearch-arrow-flight is responsible for all transport related stuff.
For example, if i just want to use arrow vectors then i can just take a dependency on opensearch-arrow-core.

Let me know what do you think? thank you

@reta
Copy link
Collaborator

reta commented Mar 31, 2025

Thanks @rishabhmaurya

Currently, all the SPIs we have defined are present in https://github.com/opensearch-project/OpenSearch/tree/main/libs/arrow-spi/src/main/java/org/opensearch/arrow/spi

I think this library should be gone, all moved to the server module where StreamManagerPlugin plugin lives.

For example, if i just want to use arrow vectors then i can just take a dependency on opensearch-arrow-core.

It think this module should not be there, this is by and large what arrow's plugin SPI should be about: provide the dependency to Arrow Flight related implementation of the generic StreamManager (which as of today, it not at all Arrow specific). The SPI should also be helpful to discover if the arrow plugin is available and could be used.

@rishabhmaurya
Copy link
Contributor Author

rishabhmaurya commented Mar 31, 2025

@reta both of your points makes total sense to me. I will try to implement it in next iteration. Thank you

On a different note, I wanted your opinion on how we should go about supporting any Transport API to use flight transport.

Let's say, we want to integrate ShardSearchRequest API to flight transport.

The way we add a new API in flight RPC is to define logic in getStream() in FlightProducer -

public void getStream(CallContext context, Ticket ticket, ServerStreamListener listener) {

You can see its associated with a Ticket, so any TransportRequest needs to be represented in byte form as a ticket, which getStream() method can decode and return a FlightStream corresponding to it.
OSAC client will ensure to encode it in byte form to request a response stream for it.

We want some sort of registry here in getStream(), which can map a TransportRequest with its action, which is nothing but an implementation of StreamProducer.

Suppose plugin arrow-search has implemented this StreamProducer which can return stream for ShardSearchRequest by implementing StreamProducer for it.
We want ShardSearchRequest TransportRequest to its action (i.e. StreamProducer) mapping to be visible in arrow-flight-rpc plugin, where FlightProducer's getStream() will use it.
I think we can use SPI model here, to read these registries from different plugins and this will the new way of defining new APIs to use this framework. And offcourse, arrow-search will use arrow-flight-rpc plugin as an extension plugin.

Please let me know your thoughts, thank you.

@rishabhmaurya
Copy link
Contributor Author

rishabhmaurya commented Apr 1, 2025

@reta

I think this library should be gone, all moved to the server module where StreamManagerPlugin plugin lives.

I'm always hesitant in adding stuff to server module if it can be a separate module for better modularity and that was the reason why we kept it in a separate module.

It think this module should not be there, this is by and large what arrow's plugin SPI should be about: provide the dependency to Arrow Flight related implementation of the generic StreamManager (which as of today, it not at all Arrow specific). The SPI should also be helpful to discover if the arrow plugin is available and could be used.

I have merged the module and the plugin.

Copy link
Contributor

github-actions bot commented Apr 1, 2025

❕ Gradle check result for 4e30c7e: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@reta
Copy link
Collaborator

reta commented Apr 1, 2025

Thanks @rishabhmaurya

I'm always hesitant in adding stuff to server module if it can be a separate module for better modularity and that was the reason why we kept it in a separate module.

The StreamManagerPlugin and SPI module are coupled together and because we cannot use SPI outside of the server or plugins (where server module is always present), it is probably not needed.

On a different note, I wanted your opinion on how we should go about supporting any Transport API to use flight transport.

Sorry, this one needs a bit of time to think about, will get to you certainly, sorry about that

@rishabhmaurya rishabhmaurya force-pushed the arrow-extended-plugin branch from 93ac90f to d389379 Compare April 2, 2025 00:36
@rishabhmaurya
Copy link
Contributor Author

@reta moved arrow-spi to server.

Sorry, this one needs a bit of time to think about, will get to you certainly, sorry about that

appreciate your help here. Thank you

Copy link
Contributor

github-actions bot commented Apr 2, 2025

❕ Gradle check result for d389379: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@rishabhmaurya rishabhmaurya force-pushed the arrow-extended-plugin branch from d389379 to ba3bc5d Compare April 9, 2025 21:18
Copy link
Contributor

github-actions bot commented Apr 9, 2025

✅ Gradle check result for ba3bc5d: SUCCESS

@@ -43,9 +43,17 @@ testClusters {
installedPlugins = Eval.me(installedPlugins)
for (String p : installedPlugins) {
plugin('plugins:'.concat(p))
if (p.equals("arrow-flight-rpc")) {
Copy link
Collaborator

@reta reta Apr 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rishabhmaurya I think we discussed this in the past: these system properties will have an impact on other plugins and modules (notably, netty based transports). Plus, they are in conflict of how we launch the OpenSearch from the standard distribution - see please https://github.com/opensearch-project/OpenSearch/blob/main/distribution/tools/launchers/src/main/java/org/opensearch/tools/launchers/SystemJvmOptions.java#L76

I would suggest to remove these changes and tackle them separately since we would need to look for possible conflicting settings and side effects.

Copy link
Contributor Author

@rishabhmaurya rishabhmaurya Apr 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta sure, I will remove it. I kept them here since they would only be applied when flight transport is installed and will not impact any defaults.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Enable Apache Arrow classes to be shared among plugins
2 participants