Skip to content

Conversation

ajleong623
Copy link
Contributor

@ajleong623 ajleong623 commented Aug 12, 2025

Description

For requesting a regularly scheduled search evaluation, the user could add an cron parameter to denote the cron job schedule for running search evaluation.

Some changes that are made are that there are now 3 new APIs for interacting with scheduling experiments. The endpoints are experiment/<job_id>/schedule which is applied to the GET and DELETE methods and experiment/schedule which is applied to the GET and POST methods.

There are 2 new indices, .scheduled-jobs and search-relevance-scheduled-experiment-history. The purpose of the .scheduled-jobs index is to store the currently running experiment schedules. The search-relevance-scheduled-experiment-history index stores the historical experiment results with timestamps which were resulted from the scheduled job runner.

Unit and integration tests are provided, however, additions such as workload management, integration with alerting and resource monitoring are not available in this pull request, but I would like to add those into a future pull request.

Please let me know if there are any questions or concerns.

Issues Resolved

#213 #226

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Anthony Leong <[email protected]>
@ajleong623 ajleong623 marked this pull request as draft August 12, 2025 23:22
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
@epugh
Copy link
Collaborator

epugh commented Aug 13, 2025

Post discussion with @wrigleyDan and @epugh we are going to change direction a bit and make the API take in an ALREADY EXISTING Experiment ID, and use that (and it's associated settings) to run the experiment every iteration.

Let's move to a cron pattern versus a interval.

We need to think about if we need a limit to how many experiments can be run...

@epugh epugh linked an issue Aug 13, 2025 that may be closed by this pull request
@epugh epugh added the v3.3.0 label Aug 13, 2025
epugh
epugh previously requested changes Aug 20, 2025
Copy link
Collaborator

@epugh epugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Progress! We are now on the cron pattern. Now to think about nesting the API under the /experiment/{experiment_id}/schedule name space.

ajleong623 and others added 5 commits August 20, 2025 16:43
Co-authored-by: Eric Pugh <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
This reverts commit 7f6352d.

Signed-off-by: Anthony Leong <[email protected]>
@ajleong623
Copy link
Contributor Author

I believe I have addressed the comments. One of them, I did add a TODO comment so that it can be addressed in the future. Right now, the solution to refactoring the logic of running experiments is a bit involved.

@ajleong623 ajleong623 marked this pull request as ready for review September 1, 2025 06:49
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
@epugh
Copy link
Collaborator

epugh commented Sep 2, 2025

You now just need to add soemthing to highlight this new Feature in the change log!

https://github.com/opensearch-project/search-relevance/blob/main/CHANGELOG.md#features

@epugh epugh added v3.4.0 and removed v3.3.0 labels Sep 19, 2025
import lombok.extern.log4j.Log4j2;

/**
* ExperimentRunningManager helps isolate the logic for running the logic in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slight awk phrasing.

@ajleong623
Copy link
Contributor Author

@martin-gaievski @fen-qin I think I am ready for the next round of code reviews as I believe I addressed the comments mentioned prior. Please let me know about any other suggestions or concerns.

…Dao.java

Co-authored-by: Eric Pugh <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
@epugh
Copy link
Collaborator

epugh commented Oct 14, 2025

Bit of pairing today with @ajleong623 and we have a dashboard! I am going to review the dashboard with @smacrakis to get some feedback. This is the final piece of the puzzle, and we are ready to get this merged.
image

Signed-off-by: Anthony Leong <[email protected]>
@ajleong623
Copy link
Contributor Author

@fen-qin Would it be possible to have another review for this pr?

Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some suggestions, few I would say are need to be addressed, rest is up to you, can be done in a follow up PR:

  • concurrency and thread safety of futures in the concurrent map in ExperimentRunningManager
  • resource leakage in SearchRelevanceJobRunner and memory in ExperimentRunningManager

) {
List<Future<?>> futures = new ArrayList<>();
if (request.getScheduledExperimentResultId() != null) {
runningFutures.put(request.getScheduledExperimentResultId(), futures);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with such logic for adding items into the map, the values are not thread-safe, multiple threads can mutate same list concurrently.
You can do something like this when you're adding to map:

runningFutures.compute(scheduledExperimentResultId, (key, existingList) -> {
    List<Future<?>> list = existingList != null ? existingList : Collections.synchronizedList(new ArrayList<>());
    list.add(future);
    return list;
});

log.error("Timeout for scheduled experiment has occured!");
} catch (CompletionException e) {
log.error("Scheduled experiment has timed out. Moving onto cleanup");
} finally {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to include latch decrement in finally. otherwise as there is no guarantee that latch is always decremented, this could lead to thread leaks.

while (actuallyFinished.getCount() > 0) {
        actuallyFinished.countDown();
    }

request,
searchConfigurations,
queryTextWithReferences,
finalResults,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we're simply accumulating results into this List of maps finalResults. This could consume significant memory. Let's at least signal to the logs in case we reached certain high threshold, probably Runtime class can give some helpful info: Runtime.getRuntime().totalMemory() or Runtime.getRuntime().freeMemory()

String judgmentId;
String experimentId;

public static final int CRON_JOB_COMPLETION_MS = 65000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why this number, is there an analytical reasoning, or this is pure empiric?

return;
}
if (checkIfCancelled(cancellationToken)) {
log.info("Experiment has been timed out while executing experiments for each queryText");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add more details, e.g.

         experimentId, elapsedTime, queryText, completed, total);```

);

// Wait until all asynchronous operations or timeout complete before cleanup
searchEvaluationTask.join();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should be able to replace blocking operations with async composition, instead of join(), use thenCompose/thenAccept

searchEvaluationTask
    .thenAccept(result -> handleSuccess(result))
    .exceptionally(error -> handleError(error));

return;
}

if (request.getType() == ExperimentType.PAIRWISE_COMPARISON) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking into these if/else I'm thinking - we need a separate interface that abstracts runner from the experiment type, something like ExperimentRunner with a single functional method runExperiment. And for each type like Pairwise or Hybrid we can have it's own implementation.

public interface ExperimentRunner {
    CompletableFuture<ExperimentResult> runExperiment(
        String experimentId, 
        PutExperimentRequest request,
        ExperimentCancellationToken token
    );
    
    ExperimentType getSupportedType();
}

We can have factory like construct that creates specific implementation based on the experiment type:

public class ExperimentRunnerFactory {
    private final Map<ExperimentType, ExperimentRunner> runners;
    
    public ExperimentRunner getRunner(ExperimentType type) {
        return runners.get(type);
    }
}

This eliminates the conditional logic and makes adding new experiment types easier through Open/Closed principle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Scheduling for running evaluations regularly

7 participants