Skip to content

Introduce lang-python module to enable Python as a scripting language in OpenSearch #18095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

yuancu
Copy link

@yuancu yuancu commented Apr 28, 2025

Description

This draft PR introduces the lang-python module to OpenSearch.

It is submitted for early code review and feedback on the implementation approach for enabling Python scripting support in OpenSearch.

Related Issues

Resolves #17432

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added the enhancement Enhancement or improvement to existing feature or request label Apr 28, 2025
@yuancu yuancu force-pushed the lang-python branch 3 times, most recently from 44668f9 to 952f477 Compare April 28, 2025 07:37
Copy link
Contributor

❌ Gradle check result for 952f477: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 248c987: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for f20a480: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented May 6, 2025

❌ Gradle check result for 70e4f2d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@gaobinlong
Copy link
Contributor

One thing I can think of is that infinite loop such as while(true) is not allowed and for loop has limited times in painless script, so you may check the implementation and add similar strategy into python script.

@yuancu
Copy link
Author

yuancu commented Jun 4, 2025

One thing I can think of is that infinite loop such as while(true) is not allowed and for loop has limited times in painless script, so you may check the implementation and add similar strategy into python script.

Thanks for reviewing! I am going to create a test for such cases. I'll let you know once I'm done.

Copy link
Contributor

@gaobinlong gaobinlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just take a first glance at the code, I think you can add more test cases and make sure all tests pass, and also make the code clean, good work! @yuancu

buildscript {
ext {
opensearch_group = "org.opensearch"
opensearch_version = System.getProperty("opensearch.version", "3.0.0-SNAPSHOT")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3.0.0 version is released, so snapshot is not needed for now.

#!/bin/sh

#
# Copyright © 2015-2021 the original authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the latest copyright header for new added files.


private static FieldScript.LeafFactory newFieldScript(
String code, Map<String, Object> params, SearchLookup lookup) {
logger.info("Executing python code: {}", code);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not combining these 3 lines log into one line? And I think it should be debug level?


// TODO: check whether code has a field `result`
context.eval("python", code);
logger.info("Eval succeeded");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a large scale cluster with high search load with python script is used, these kind of log would be enormous, we should downgrade the log level, I think.

// }

Set<String> accessedDocFields =
PythonScriptUtility.extractAccessedDocFields(code);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the metadata fields like _id, _index, _routing can be accessed? If so do we have some use case? If we don't have some use cases then I think we should prevent the metadata fields from being accessed in the first version.

@@ -0,0 +1,154 @@
package org.opensearch.python.antlr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Header is missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] Support Python in OS Scripting Service
2 participants