-
Notifications
You must be signed in to change notification settings - Fork 21
Scheduler for regular search evaluation runs #220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Post discussion with @wrigleyDan and @epugh we are going to change direction a bit and make the API take in an ALREADY EXISTING Experiment ID, and use that (and it's associated settings) to run the experiment every iteration. Let's move to a cron pattern versus a interval. We need to think about if we need a limit to how many experiments can be run... |
…jobs index Signed-off-by: Anthony Leong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Progress! We are now on the cron pattern. Now to think about nesting the API under the /experiment/{experiment_id}/schedule name space.
src/main/java/org/opensearch/searchrelevance/common/PluginConstants.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/searchrelevance/common/PluginConstants.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/searchrelevance/common/PluginConstants.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/searchrelevance/common/PluginConstants.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/searchrelevance/rest/RestPostScheduledExperimentAction.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/searchrelevance/rest/RestDeleteScheduledExperimentAction.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/searchrelevance/rest/RestPutExperimentAction.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/searchrelevance/scheduler/SearchRelevanceJobRunner.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/searchrelevance/scheduler/SearchRelevanceJobRunner.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Co-authored-by: Eric Pugh <[email protected]> Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
This reverts commit 7f6352d. Signed-off-by: Anthony Leong <[email protected]>
96e8016
to
fd62b14
Compare
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
I believe I have addressed the comments. One of them, I did add a TODO comment so that it can be addressed in the future. Right now, the solution to refactoring the logic of running experiments is a bit involved. |
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
You now just need to add soemthing to highlight this new Feature in the change log! https://github.com/opensearch-project/search-relevance/blob/main/CHANGELOG.md#features |
Signed-off-by: Anthony Leong <[email protected]>
src/main/java/org/opensearch/searchrelevance/scheduler/SearchRelevanceJobParameters.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
…ents on underlying experiment deletion Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
…relevance into job-scheduler
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
src/main/java/org/opensearch/searchrelevance/dao/ScheduledJobsDao.java
Outdated
Show resolved
Hide resolved
import lombok.extern.log4j.Log4j2; | ||
|
||
/** | ||
* ExperimentRunningManager helps isolate the logic for running the logic in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slight awk phrasing.
@martin-gaievski @fen-qin I think I am ready for the next round of code reviews as I believe I addressed the comments mentioned prior. Please let me know about any other suggestions or concerns. |
…Dao.java Co-authored-by: Eric Pugh <[email protected]> Signed-off-by: Anthony Leong <[email protected]>
Bit of pairing today with @ajleong623 and we have a dashboard! I am going to review the dashboard with @smacrakis to get some feedback. This is the final piece of the puzzle, and we are ready to get this merged. |
Signed-off-by: Anthony Leong <[email protected]>
@fen-qin Would it be possible to have another review for this pr? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some suggestions, few I would say are need to be addressed, rest is up to you, can be done in a follow up PR:
- concurrency and thread safety of futures in the concurrent map in ExperimentRunningManager
- resource leakage in SearchRelevanceJobRunner and memory in ExperimentRunningManager
) { | ||
List<Future<?>> futures = new ArrayList<>(); | ||
if (request.getScheduledExperimentResultId() != null) { | ||
runningFutures.put(request.getScheduledExperimentResultId(), futures); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with such logic for adding items into the map, the values are not thread-safe, multiple threads can mutate same list concurrently.
You can do something like this when you're adding to map:
runningFutures.compute(scheduledExperimentResultId, (key, existingList) -> {
List<Future<?>> list = existingList != null ? existingList : Collections.synchronizedList(new ArrayList<>());
list.add(future);
return list;
});
log.error("Timeout for scheduled experiment has occured!"); | ||
} catch (CompletionException e) { | ||
log.error("Scheduled experiment has timed out. Moving onto cleanup"); | ||
} finally { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to include latch decrement in finally. otherwise as there is no guarantee that latch is always decremented, this could lead to thread leaks.
while (actuallyFinished.getCount() > 0) {
actuallyFinished.countDown();
}
request, | ||
searchConfigurations, | ||
queryTextWithReferences, | ||
finalResults, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we're simply accumulating results into this List of maps finalResults
. This could consume significant memory. Let's at least signal to the logs in case we reached certain high threshold, probably Runtime class can give some helpful info: Runtime.getRuntime().totalMemory()
or Runtime.getRuntime().freeMemory()
String judgmentId; | ||
String experimentId; | ||
|
||
public static final int CRON_JOB_COMPLETION_MS = 65000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why this number, is there an analytical reasoning, or this is pure empiric?
return; | ||
} | ||
if (checkIfCancelled(cancellationToken)) { | ||
log.info("Experiment has been timed out while executing experiments for each queryText"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add more details, e.g.
experimentId, elapsedTime, queryText, completed, total);```
); | ||
|
||
// Wait until all asynchronous operations or timeout complete before cleanup | ||
searchEvaluationTask.join(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should be able to replace blocking operations with async composition, instead of join(), use thenCompose/thenAccept
searchEvaluationTask
.thenAccept(result -> handleSuccess(result))
.exceptionally(error -> handleError(error));
return; | ||
} | ||
|
||
if (request.getType() == ExperimentType.PAIRWISE_COMPARISON) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking into these if/else I'm thinking - we need a separate interface that abstracts runner from the experiment type, something like ExperimentRunner
with a single functional method runExperiment
. And for each type like Pairwise or Hybrid we can have it's own implementation.
public interface ExperimentRunner {
CompletableFuture<ExperimentResult> runExperiment(
String experimentId,
PutExperimentRequest request,
ExperimentCancellationToken token
);
ExperimentType getSupportedType();
}
We can have factory like construct that creates specific implementation based on the experiment type:
public class ExperimentRunnerFactory {
private final Map<ExperimentType, ExperimentRunner> runners;
public ExperimentRunner getRunner(ExperimentType type) {
return runners.get(type);
}
}
This eliminates the conditional logic and makes adding new experiment types easier through Open/Closed principle
Description
For requesting a regularly scheduled search evaluation, the user could add an
cron
parameter to denote the cron job schedule for running search evaluation.Some changes that are made are that there are now 3 new APIs for interacting with scheduling experiments. The endpoints are
experiment/<job_id>/schedule
which is applied to the GET and DELETE methods andexperiment/schedule
which is applied to the GET and POST methods.There are 2 new indices,
.scheduled-jobs
andsearch-relevance-scheduled-experiment-history
. The purpose of the.scheduled-jobs
index is to store the currently running experiment schedules. Thesearch-relevance-scheduled-experiment-history
index stores the historical experiment results with timestamps which were resulted from the scheduled job runner.Unit and integration tests are provided, however, additions such as workload management, integration with alerting and resource monitoring are not available in this pull request, but I would like to add those into a future pull request.
Please let me know if there are any questions or concerns.
Issues Resolved
#213 #226
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.