Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Enhance validation for create connector API #3579

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

akolarkunnu
Copy link
Contributor

Description

This change will address the second part of validation "pre and post processing function validation".
Moved the method getRemoteServerFromURL() from ConnectorUtils.java to ConnectorAction.java, to avoid the cyclic dependency

Related Issues

Partially resolves #2993

Check List

  • New functionality includes testing.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@akolarkunnu akolarkunnu requested a deployment to ml-commons-cicd-env-require-approval February 21, 2025 11:12 — with GitHub Actions Waiting
@akolarkunnu akolarkunnu requested a deployment to ml-commons-cicd-env-require-approval February 21, 2025 11:12 — with GitHub Actions Waiting
@akolarkunnu akolarkunnu requested a deployment to ml-commons-cicd-env-require-approval February 21, 2025 11:12 — with GitHub Actions Waiting
@akolarkunnu akolarkunnu requested a deployment to ml-commons-cicd-env-require-approval February 21, 2025 11:12 — with GitHub Actions Waiting
@akolarkunnu akolarkunnu had a problem deploying to ml-commons-cicd-env-require-approval February 24, 2025 11:09 — with GitHub Actions Error
@akolarkunnu akolarkunnu temporarily deployed to ml-commons-cicd-env-require-approval February 24, 2025 11:09 — with GitHub Actions Inactive
@akolarkunnu akolarkunnu temporarily deployed to ml-commons-cicd-env-require-approval February 24, 2025 11:09 — with GitHub Actions Inactive
@akolarkunnu akolarkunnu had a problem deploying to ml-commons-cicd-env-require-approval February 24, 2025 11:09 — with GitHub Actions Failure
@akolarkunnu
Copy link
Contributor Author

Can anyone trigger the failed test suite once more, failure doesn't have any relation with changes.

@akolarkunnu akolarkunnu had a problem deploying to ml-commons-cicd-env-require-approval February 26, 2025 07:11 — with GitHub Actions Error
@akolarkunnu akolarkunnu had a problem deploying to ml-commons-cicd-env-require-approval February 26, 2025 07:11 — with GitHub Actions Failure
@akolarkunnu akolarkunnu had a problem deploying to ml-commons-cicd-env-require-approval February 26, 2025 19:35 — with GitHub Actions Failure
@akolarkunnu akolarkunnu had a problem deploying to ml-commons-cicd-env-require-approval March 3, 2025 20:19 — with GitHub Actions Failure
@akolarkunnu akolarkunnu had a problem deploying to ml-commons-cicd-env-require-approval March 3, 2025 20:19 — with GitHub Actions Error
This change will address the second part of validation "pre and post processing function validation".

Partially resolves opensearch-project#2993

Signed-off-by: Abdul Muneer Kolarkunnu <[email protected]>
@akolarkunnu akolarkunnu had a problem deploying to ml-commons-cicd-env-require-approval March 12, 2025 06:04 — with GitHub Actions Failure
@akolarkunnu akolarkunnu temporarily deployed to ml-commons-cicd-env-require-approval March 12, 2025 06:04 — with GitHub Actions Inactive
@akolarkunnu akolarkunnu had a problem deploying to ml-commons-cicd-env-require-approval March 12, 2025 06:04 — with GitHub Actions Error
@akolarkunnu akolarkunnu temporarily deployed to ml-commons-cicd-env-require-approval March 12, 2025 06:04 — with GitHub Actions Inactive
@akolarkunnu
Copy link
Contributor Author

@zane-neo Can you please look in o these two failures, my changes around prepostProcessFunctions:
116 tests completed, 3 failed, 11 skipped

  • org.opensearch.ml.rest.RestBedRockInferenceIT.test_bedrock_embedding_v2_model_with_postProcessFunction
  • org.opensearch.ml.rest.RestCohereInferenceIT.test_cohereInference_withDifferent_postProcessFunction

I am not able to reproduce and debug these failures locally because of below dependencies:

The AWS credentials are not set. Skipping test.

COHERE_KEY is null, skipping the test!

@zane-neo
Copy link
Collaborator

@zane-neo Can you please look in o these two failures, my changes around prepostProcessFunctions: 116 tests completed, 3 failed, 11 skipped

  • org.opensearch.ml.rest.RestBedRockInferenceIT.test_bedrock_embedding_v2_model_with_postProcessFunction
  • org.opensearch.ml.rest.RestCohereInferenceIT.test_cohereInference_withDifferent_postProcessFunction

I am not able to reproduce and debug these failures locally because of below dependencies:

The AWS credentials are not set. Skipping test.

COHERE_KEY is null, skipping the test!

Please rebase the latest code on main branch, this has been fixed.

@akolarkunnu akolarkunnu had a problem deploying to ml-commons-cicd-env-require-approval March 13, 2025 11:56 — with GitHub Actions Failure
@akolarkunnu akolarkunnu had a problem deploying to ml-commons-cicd-env-require-approval March 13, 2025 11:56 — with GitHub Actions Failure
@akolarkunnu akolarkunnu had a problem deploying to ml-commons-cicd-env-require-approval March 13, 2025 11:56 — with GitHub Actions Error
@akolarkunnu akolarkunnu had a problem deploying to ml-commons-cicd-env-require-approval March 13, 2025 11:56 — with GitHub Actions Failure
@akolarkunnu
Copy link
Contributor Author

linux (23) - Known flaky org.opensearch.ml.rest.RestMLRemoteInferenceIT.testPredictWithAutoDeployAndTTL_RemoteModel #3544

linux (21) - There are no real failures

Windows (21) - org.opensearch.ml.rest.RestMLRAGSearchProcessorIT.testBM25WithBedrockConverse - not related the code changes

@akolarkunnu
Copy link
Contributor Author

akolarkunnu commented Mar 14, 2025

Hi Maintainers, Please help to move forward this task.
There are 2 skipped tasks, 1 cancelled task and 3 failed tasks, failures are not related to this changes.

@pyek-bot
Copy link
Contributor

Hi Maintainers, Please help to move forward this task? There are 2 skipped tasks, 1 cancelled task and 3 failed tasks, failures are not related to this changes.

Hi @akolarkunnu, let me review this PR and test out some of the changes. In the meanwhile, the maintainers can help re-trigger the CI and we can check if it goes through? Thanks for the fix and sharing the test details!

@akolarkunnu akolarkunnu had a problem deploying to ml-commons-cicd-env-require-approval March 14, 2025 05:46 — with GitHub Actions Failure
@akolarkunnu akolarkunnu temporarily deployed to ml-commons-cicd-env-require-approval March 14, 2025 05:46 — with GitHub Actions Inactive
This change will address the second part of validation "pre and post processing function validation".

Partially resolves opensearch-project#2993

Signed-off-by: Abdul Muneer Kolarkunnu <[email protected]>
@akolarkunnu akolarkunnu temporarily deployed to ml-commons-cicd-env-require-approval March 17, 2025 09:12 — with GitHub Actions Inactive
@akolarkunnu akolarkunnu had a problem deploying to ml-commons-cicd-env-require-approval March 17, 2025 09:12 — with GitHub Actions Error
@akolarkunnu akolarkunnu temporarily deployed to ml-commons-cicd-env-require-approval March 17, 2025 09:12 — with GitHub Actions Inactive
@akolarkunnu akolarkunnu had a problem deploying to ml-commons-cicd-env-require-approval March 17, 2025 09:12 — with GitHub Actions Failure
@akolarkunnu
Copy link
Contributor Author

akolarkunnu commented Mar 18, 2025

Please approve for CI - "Waiting for review: ml-commons-cicd-env-require-approval needs approval to start deploying changes."

@akolarkunnu
Copy link
Contributor Author

Hi Maintainers, @pyek-bot Please approve to run CI.

}
switch (remoteServer) {
case OPENAI:
if (!preProcessFunction.contains(OPENAI)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit confused with the latest change, so now the only validation is if the preProcessFunction contains "openai" as opposed to "connector.post_process.default.embedding"? so this in theory a more lenient validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did that change based on your previous comment - "thinking if for these constants we can create the string using the INBUILT_FUNC_PREFIX? and maybe even have preprocessprefix and postprocessprefix? this can avoid any accidental changes "
I can see only that's the only(eg: openai, bedrock, cohere) unique text in the function names which we can defer for different llm services.
Or you meant something differently ? Creating different constant arrays for each llm service post and pre Process Functions ? So in this case we will have 8 arrays, 4 (openai, bedrock, cohere, sagemaker) for pre process functions and 4 for post process functions ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there has been a confusion: #3579 (comment)

In this comment, I meant that we should create the constants as such:
MLPostProcessFunction.OPENAI_EMBEDDING = INBUILT_FUNC_PREFIX + "openai.embedding" + ACTION_POST_PROCESS_FUNCTION*

*just an example may not be fully code accurate

This way if either of these constants change, we don't have to change the code. With respect to whether the check should be lenient or not we can wait on more comments.

@@ -185,6 +198,90 @@ public static ConnectorAction parse(XContentParser parser) throws IOException {
.build();
}

public void validatePrePostProcessFunctions(Map<String, String> parameters) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add java doc with more detailed explanations.

validatePostProcessFunctions(remoteServer);
}

private void validatePreProcessFunctions(String remoteServer) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's organize the private methods in the order it was being invoked.getRemoteServerFromURL and then validatePreProcessFunctions. Its easy to read for reviewers.

StringSubstitutor substitutor = new StringSubstitutor(parameters, "${parameters.", "}");
String endPoint = substitutor.replace(url);
String remoteServer = getRemoteServerFromURL(endPoint);
validatePreProcessFunctions(remoteServer);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (isInBuiltProcessFunction(preProcessFunction)) then we can invoke validatePreProcessFunctions?

@@ -62,6 +85,257 @@ public void constructor_NullMethod() {
assertEquals("method can't be null", exception.getMessage());
}

@Test
public void connectorWithNullPreProcessFunction() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename the unit tests. The name doesn't quite reflect the intent of the test. Same applies for other methods as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more clear Your test name should answer three things:

What is being tested? (method or scenario)
Under what conditions? (input or setup)
What is the expected outcome?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement] Enhance validation for create connector API
4 participants