Scheduler for regular search evaluation runs #220

ajleong623 · 2025-08-12T23:22:08Z

Description

For requesting a regularly scheduled search evaluation, the user could add an cron parameter to denote the cron job schedule for running search evaluation.

Some changes that are made are that there are now 3 new APIs for interacting with scheduling experiments. The endpoints are experiment/<job_id>/schedule which is applied to the GET and DELETE methods and experiment/schedule which is applied to the GET and POST methods.

There are 2 new indices, .scheduled-jobs and search-relevance-scheduled-experiment-history. The purpose of the .scheduled-jobs index is to store the currently running experiment schedules. The search-relevance-scheduled-experiment-history index stores the historical experiment results with timestamps which were resulted from the scheduled job runner.

Unit and integration tests are provided, however, additions such as workload management, integration with alerting and resource monitoring are not available in this pull request, but I would like to add those into a future pull request.

Please let me know if there are any questions or concerns.

Issues Resolved

#213 #226

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Anthony Leong <[email protected]>

epugh · 2025-08-13T15:29:04Z

Post discussion with @wrigleyDan and @epugh we are going to change direction a bit and make the API take in an ALREADY EXISTING Experiment ID, and use that (and it's associated settings) to run the experiment every iteration.

Let's move to a cron pattern versus a interval.

We need to think about if we need a limit to how many experiments can be run...

…jobs index Signed-off-by: Anthony Leong <[email protected]>

epugh

Progress! We are now on the cron pattern. Now to think about nesting the API under the /experiment/{experiment_id}/schedule name space.

src/main/java/org/opensearch/searchrelevance/common/PluginConstants.java

src/main/java/org/opensearch/searchrelevance/dao/ScheduledJobsDao.java

src/main/java/org/opensearch/searchrelevance/rest/RestPostScheduledExperimentAction.java

src/main/java/org/opensearch/searchrelevance/rest/RestDeleteScheduledExperimentAction.java

src/main/java/org/opensearch/searchrelevance/rest/RestPutExperimentAction.java

src/main/java/org/opensearch/searchrelevance/scheduler/SearchRelevanceJobRunner.java

Signed-off-by: Anthony Leong <[email protected]>

Co-authored-by: Eric Pugh <[email protected]> Signed-off-by: Anthony Leong <[email protected]>

Signed-off-by: Anthony Leong <[email protected]>

This reverts commit 7f6352d. Signed-off-by: Anthony Leong <[email protected]>

Signed-off-by: Anthony Leong <[email protected]>

ajleong623 · 2025-09-01T06:49:55Z

I believe I have addressed the comments. One of them, I did add a TODO comment so that it can be addressed in the future. Right now, the solution to refactoring the logic of running experiments is a bit involved.

Signed-off-by: Anthony Leong <[email protected]>

epugh · 2025-09-02T18:11:34Z

You now just need to add soemthing to highlight this new Feature in the change log!

https://github.com/opensearch-project/search-relevance/blob/main/CHANGELOG.md#features

Signed-off-by: Anthony Leong <[email protected]>

src/main/java/org/opensearch/searchrelevance/scheduler/SearchRelevanceJobParameters.java

Signed-off-by: Anthony Leong <[email protected]>

…ents on underlying experiment deletion Signed-off-by: Anthony Leong <[email protected]>

Signed-off-by: Anthony Leong <[email protected]>

…relevance into job-scheduler

Signed-off-by: Anthony Leong <[email protected]>

src/main/java/org/opensearch/searchrelevance/dao/ScheduledJobsDao.java

epugh · 2025-10-09T19:02:18Z

src/main/java/org/opensearch/searchrelevance/executors/ExperimentRunningManager.java

+import lombok.extern.log4j.Log4j2;
+
+/**
+ * ExperimentRunningManager helps isolate the logic for running the logic in


slight awk phrasing.

ajleong623 · 2025-10-09T19:33:03Z

@martin-gaievski @fen-qin I think I am ready for the next round of code reviews as I believe I addressed the comments mentioned prior. Please let me know about any other suggestions or concerns.

…Dao.java Co-authored-by: Eric Pugh <[email protected]> Signed-off-by: Anthony Leong <[email protected]>

epugh · 2025-10-14T19:31:14Z

Bit of pairing today with @ajleong623 and we have a dashboard! I am going to review the dashboard with @smacrakis to get some feedback. This is the final piece of the puzzle, and we are ready to get this merged.

Signed-off-by: Anthony Leong <[email protected]>

ajleong623 · 2025-10-17T19:45:26Z

@fen-qin Would it be possible to have another review for this pr?

martin-gaievski

Added some suggestions, few I would say are need to be addressed, rest is up to you, can be done in a follow up PR:

concurrency and thread safety of futures in the concurrent map in ExperimentRunningManager
resource leakage in SearchRelevanceJobRunner and memory in ExperimentRunningManager

martin-gaievski · 2025-10-17T21:51:57Z

src/main/java/org/opensearch/searchrelevance/executors/ExperimentRunningManager.java

+    ) {
+        List<Future<?>> futures = new ArrayList<>();
+        if (request.getScheduledExperimentResultId() != null) {
+            runningFutures.put(request.getScheduledExperimentResultId(), futures);


with such logic for adding items into the map, the values are not thread-safe, multiple threads can mutate same list concurrently.
You can do something like this when you're adding to map:

runningFutures.compute(scheduledExperimentResultId, (key, existingList) -> { List<Future<?>> list = existingList != null ? existingList : Collections.synchronizedList(new ArrayList<>()); list.add(future); return list; });

martin-gaievski · 2025-10-17T21:56:32Z

src/main/java/org/opensearch/searchrelevance/scheduler/SearchRelevanceJobRunner.java

+            log.error("Timeout for scheduled experiment has occured!");
+        } catch (CompletionException e) {
+            log.error("Scheduled experiment has timed out. Moving onto cleanup");
+        } finally {


I think we need to include latch decrement in finally. otherwise as there is no guarantee that latch is always decremented, this could lead to thread leaks.

while (actuallyFinished.getCount() > 0) { actuallyFinished.countDown(); }

martin-gaievski · 2025-10-17T22:02:35Z

src/main/java/org/opensearch/searchrelevance/executors/ExperimentRunningManager.java

+            request,
+            searchConfigurations,
+            queryTextWithReferences,
+            finalResults,


looks like we're simply accumulating results into this List of maps finalResults. This could consume significant memory. Let's at least signal to the logs in case we reached certain high threshold, probably Runtime class can give some helpful info: Runtime.getRuntime().totalMemory() or Runtime.getRuntime().freeMemory()

martin-gaievski · 2025-10-17T22:03:12Z

src/test/java/org/opensearch/searchrelevance/action/scheduledJob/ScheduledJobsIT.java

+    String judgmentId;
+    String experimentId;
+
+    public static final int CRON_JOB_COMPLETION_MS = 65000;


curious why this number, is there an analytical reasoning, or this is pure empiric?

martin-gaievski · 2025-10-17T22:05:12Z

src/main/java/org/opensearch/searchrelevance/executors/ExperimentRunningManager.java

+                return;
+            }
+            if (checkIfCancelled(cancellationToken)) {
+                log.info("Experiment has been timed out while executing experiments for each queryText");


can you add more details, e.g.

experimentId, elapsedTime, queryText, completed, total);```

martin-gaievski · 2025-10-17T22:06:57Z

src/main/java/org/opensearch/searchrelevance/scheduler/SearchRelevanceJobRunner.java

+            );
+
+            // Wait until all asynchronous operations or timeout complete before cleanup
+            searchEvaluationTask.join();


you should be able to replace blocking operations with async composition, instead of join(), use thenCompose/thenAccept

searchEvaluationTask .thenAccept(result -> handleSuccess(result)) .exceptionally(error -> handleError(error));

martin-gaievski · 2025-10-17T22:17:20Z

src/main/java/org/opensearch/searchrelevance/executors/ExperimentRunningManager.java

+                return;
+            }
+
+            if (request.getType() == ExperimentType.PAIRWISE_COMPARISON) {


looking into these if/else I'm thinking - we need a separate interface that abstracts runner from the experiment type, something like ExperimentRunner with a single functional method runExperiment. And for each type like Pairwise or Hybrid we can have it's own implementation.

public interface ExperimentRunner { CompletableFuture<ExperimentResult> runExperiment( String experimentId, PutExperimentRequest request, ExperimentCancellationToken token ); ExperimentType getSupportedType(); }

We can have factory like construct that creates specific implementation based on the experiment type:

public class ExperimentRunnerFactory { private final Map<ExperimentType, ExperimentRunner> runners; public ExperimentRunner getRunner(ExperimentType type) { return runners.get(type); } }

This eliminates the conditional logic and makes adding new experiment types easier through Open/Closed principle

add job scheduler

61b6a6a

Signed-off-by: Anthony Leong <[email protected]>

ajleong623 marked this pull request as draft August 12, 2025 23:22

ajleong623 added 2 commits August 12, 2025 16:36

add job scheduler plugin

ade1192

Signed-off-by: Anthony Leong <[email protected]>

fixed pairwise error

50319e4

Signed-off-by: Anthony Leong <[email protected]>

epugh linked an issue Aug 13, 2025 that may be closed by this pull request

[FEATURE] Scheduling for running evaluations regularly #213

Open

epugh added the v3.3.0 label Aug 13, 2025

added actions for scheduling and deleting jobs, validations, and new …

b554ea6

…jobs index Signed-off-by: Anthony Leong <[email protected]>

epugh previously requested changes Aug 20, 2025

View reviewed changes

ajleong623 mentioned this pull request Aug 20, 2025

[RFC] Running Regularly Scheduled Search Evaluations Design #226

Open

ajleong623 and others added 5 commits August 20, 2025 16:43

added initial draft of technical design

8309d93

Signed-off-by: Anthony Leong <[email protected]>

made changes based on small suggestions

4b5340a

Signed-off-by: Anthony Leong <[email protected]>

Apply suggestions from code review

7f6352d

Co-authored-by: Eric Pugh <[email protected]> Signed-off-by: Anthony Leong <[email protected]>

made changes based on small suggestions

b73746a

Signed-off-by: Anthony Leong <[email protected]>

Revert "Apply suggestions from code review"

fd62b14

This reverts commit 7f6352d. Signed-off-by: Anthony Leong <[email protected]>

ajleong623 force-pushed the job-scheduler branch from 96e8016 to fd62b14 Compare August 21, 2025 07:03

ajleong623 added 6 commits August 21, 2025 00:07

reapply changes from suggestion

d3d053a

Signed-off-by: Anthony Leong <[email protected]>

add new persistent index and modified request url

d338bf4

Signed-off-by: Anthony Leong <[email protected]>

still need to add integration tests

bb3426d

Signed-off-by: Anthony Leong <[email protected]>

finished all integration tests

03725fa

Signed-off-by: Anthony Leong <[email protected]>

update gradle build file

efdc0c6

Signed-off-by: Anthony Leong <[email protected]>

update design document

95581cb

Signed-off-by: Anthony Leong <[email protected]>

ajleong623 marked this pull request as ready for review September 1, 2025 06:49

ajleong623 added 3 commits September 1, 2025 00:00

update build file

a547b68

Signed-off-by: Anthony Leong <[email protected]>

update build.gradle

fde4c04

Signed-off-by: Anthony Leong <[email protected]>

yamlRestTest dependencies installed

8a1efcb

Signed-off-by: Anthony Leong <[email protected]>

ajleong623 added 2 commits September 2, 2025 11:46

Merge branch 'opensearch-project:main' into job-scheduler

827acdd

add changelog line

51de07a

Signed-off-by: Anthony Leong <[email protected]>

epugh reviewed Sep 17, 2025

View reviewed changes

src/main/java/org/opensearch/searchrelevance/scheduler/SearchRelevanceJobParameters.java Outdated Show resolved Hide resolved

ajleong623 added 2 commits September 17, 2025 11:54

add more comments and documentations

ed511ac

Signed-off-by: Anthony Leong <[email protected]>

cleaned up deleted job scheduled

031b97b

Signed-off-by: Anthony Leong <[email protected]>

epugh added v3.4.0 and removed v3.3.0 labels Sep 19, 2025

ajleong623 and others added 13 commits September 20, 2025 12:07

added scheduled parameter to experiment and cleanup scheduled experim…

f19c375

…ents on underlying experiment deletion Signed-off-by: Anthony Leong <[email protected]>

help fix forbidden apis

9941bd7

Signed-off-by: Anthony Leong <[email protected]>

retry tests

7aa1c32

Signed-off-by: Anthony Leong <[email protected]>

Merge branch 'opensearch-project:main' into job-scheduler

e2cc6c1

Couple of text tweaks...

1a404f0

increase timeout to one hour for production

fc9ab7f

Signed-off-by: Anthony Leong <[email protected]>

Merge branch 'job-scheduler' of https://github.com/ajleong623/search-…

7ef0e60

…relevance into job-scheduler

fix timeout test

eaabfd1

Signed-off-by: Anthony Leong <[email protected]>

update action names

3e514e4

Signed-off-by: Anthony Leong <[email protected]>

reenable neural search

24852eb

Signed-off-by: Anthony Leong <[email protected]>

update scheduled run id value

dbf5f73

Signed-off-by: Anthony Leong <[email protected]>

reenable ml plugin

bfc13cc

Signed-off-by: Anthony Leong <[email protected]>

cleanup unused constants

1a27b3d

Signed-off-by: Anthony Leong <[email protected]>

epugh reviewed Oct 9, 2025

View reviewed changes

src/main/java/org/opensearch/searchrelevance/dao/ScheduledJobsDao.java Outdated Show resolved Hide resolved

epugh reviewed Oct 9, 2025

View reviewed changes

Update src/main/java/org/opensearch/searchrelevance/dao/ScheduledJobs…

7cf94bf

…Dao.java Co-authored-by: Eric Pugh <[email protected]> Signed-off-by: Anthony Leong <[email protected]>

epugh requested review from cwperks, epugh and martin-gaievski October 14, 2025 19:33

update comments

5082b5d

Signed-off-by: Anthony Leong <[email protected]>

ajleong623 mentioned this pull request Oct 16, 2025

[FEATURE] Allow cancelling of active long running tasks: experiment execution, judgment rating generation #265

Open

martin-gaievski reviewed Oct 17, 2025

View reviewed changes

Scheduler for regular search evaluation runs #220

Are you sure you want to change the base?

Scheduler for regular search evaluation runs #220

Conversation

ajleong623 commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues Resolved

Uh oh!

epugh commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

epugh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ajleong623 commented Sep 1, 2025

Uh oh!

epugh commented Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajleong623 commented Oct 9, 2025

Uh oh!

epugh commented Oct 14, 2025

Uh oh!

ajleong623 commented Oct 17, 2025

Uh oh!

martin-gaievski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ajleong623 commented Aug 12, 2025 •

edited

Loading

epugh commented Aug 13, 2025 •

edited

Loading