Skip to content

Conversation

@jackiehanyang
Copy link
Collaborator

Description

  • Introducing a new Insights API
    • POST /_plugins/_anomaly_detection/insights/_start - Start insights job
    • GET /_plugins/_anomaly_detection/insights/_status - Get insights job status
    • GET /_plugins/_anomaly_detection/insights/_results - Get latest insights results
    • POST /_plugins/_anomaly_detection/insights/_stop - Stop insights job
  • Introducing ml-commons metrics correlation runtime dependency
    • sending anomaly results to ml-commons metrics correlation algorithm to analyze
    • write analyze results into insights-results index
    • frontend will read from this index to display insights on dashboard

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@kaituo
Copy link
Collaborator

kaituo commented Nov 13, 2025

CI failed due to jacoco changes in build.gradle. Not sure how to fix. One naive way is to add correlation request, response, and Action in AD to avoid ml-commons dependency.

* What went wrong:
Execution failed for task ':jacocoTestCoverageVerification'.
> A failure occurred while executing org.gradle.internal.jacoco.JacocoCoverageAction
   > Rule violated for class org.opensearch.ad.AnomalyDetectorRunner: branches covered ratio is 0.35, but expected minimum is 0.60
     Rule violated for class org.opensearch.ad.AnomalyDetectorRunner: lines covered ratio is 0.47, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.util.ModelUtil: branches covered ratio is 0.32, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.util.ModelUtil: lines covered ratio is 0.48, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.util.DataUtil: lines covered ratio is 0.72, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.feature.AbstractRetriever: branches covered ratio is 0.55, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.feature.AbstractRetriever: lines covered ratio is 0.63, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.feature.SearchFeatureDao: branches covered ratio is 0.28, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.feature.SearchFeatureDao: lines covered ratio is 0.59, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.handler.ModelValidationActionHandler: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.handler.ModelValidationActionHandler: lines covered ratio is 0.00, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.handler.ConfigUpdateConfirmer: branches covered ratio is 0.06, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.handler.ConfigUpdateConfirmer: lines covered ratio is 0.19, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.handler.AggregationPrep: branches covered ratio is 0.36, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.handler.AggregationPrep: lines covered ratio is 0.40, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.handler.IntervalCalculation: branches covered ratio is 0.06, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.handler.IntervalCalculation: lines covered ratio is 0.18, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.handler.LatestTimeRetriever: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.handler.LatestTimeRetriever: lines covered ratio is 0.00, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.handler.IntervalCalculation.IntervalRecommendationListener: branches covered ratio is 0.37, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.handler.IntervalCalculation.IntervalRecommendationListener: lines covered ratio is 0.55, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.ratelimit.ColdStartWorker: branches covered ratio is 0.45, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.ratelimit.ColdStartWorker: lines covered ratio is 0.73, but expected minimum is 0.75
     Rule violated for class org.opensearch.ad.transport.SuggestAnomalyDetectorParamTransportAction: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.ad.transport.SuggestAnomalyDetectorParamTransportAction: lines covered ratio is 0.11, but expected minimum is 0.75
     Rule violated for class org.opensearch.ad.transport.ADSuggestName: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.ad.transport.ADSuggestName: lines covered ratio is 0.57, but expected minimum is 0.75
     Rule violated for class org.opensearch.ad.transport.ADResultProcessor: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.ad.transport.ADResultProcessor: lines covered ratio is 0.61, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.model.InitProgressProfile: branches covered ratio is 0.50, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.model.IntervalTimeConfiguration: branches covered ratio is 0.50, but expected minimum is 0.60
     Rule violated for class org.opensearch.ad.ml.MLCommonsClient: lines covered ratio is 0.62, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.RestValidateAction: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.RestValidateAction: lines covered ratio is 0.26, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.RestJobAction: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.RestJobAction: lines covered ratio is 0.25, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.AbstractSearchAction: lines covered ratio is 0.60, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.transport.SuggestConfigParamResponse: branches covered ratio is 0.59, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.transport.SuggestConfigParamRequest: branches covered ratio is 0.50, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.transport.SuggestConfigParamRequest: lines covered ratio is 0.68, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.transport.SuggestConfigParamResponse.Builder: lines covered ratio is 0.57, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.transport.handler.IndexMemoryPressureAwareResultHandler: branches covered ratio is 0.54, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.transport.handler.IndexMemoryPressureAwareResultHandler: lines covered ratio is 0.68, but expected minimum is 0.75
     Rule violated for class org.opensearch.ad.ratelimit.ADSaveResultStrategy: branches covered ratio is 0.43, but expected minimum is 0.60
     Rule violated for class org.opensearch.ad.ratelimit.ADSaveResultStrategy: lines covered ratio is 0.53, but expected minimum is 0.75

Copy link
Collaborator

@kaituo kaituo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review


InjectSecurity injectSecurity = new InjectSecurity(jobParameter.getName(), settings, localClient.threadPool().getThreadContext());
try {
injectSecurity.inject(user, roles);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A normal user cannot query system index. Please add security tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in the new revision

// Insights job
// ======================================
// The Insights job name
public static final String INSIGHTS_JOB_NAME = "insights_job";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about changing to ad_insights_job in case we need forecasting job later?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in the new revision

Copy link
Collaborator

@kaituo kaituo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review

return ImmutableList
.of(
// Start insights job
new ReplacedRoute(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need ReplaceRoute as this is a new API. TimeSeriesAnalyticsPlugin.AD_BASE_URI alone is enough.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in the new revision

builder.startObject();

// Task metadata
builder.field("task_id", "task_" + ADCommonName.INSIGHTS_JOB_NAME + "_" + UUID.randomUUID().toString());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need task id? AD task id is the doc id of state index.


if (parts.length > 1) {
String seriesKey = parts[1];
seriesKeys.add(seriesKey);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the entities set redundant with seriesKeys set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't necessarily need it. Just followed the current practice to have this logical run identifier for the insights generation. Maybe it's useful in the future when integrate with Investigation so we can refer to a specific insights run this this id.

Comment on lines +44 to +48
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 32766
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need keyword?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly, was following other AD index mapping to see how is it storing text field. Removed in the new revision

Copy link
Collaborator

@kaituo kaituo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review

Comment on lines +71 to +75
handleStartOperation(request, listener);
} else if (request.isStatusOperation()) {
handleStatusOperation(request, listener);
} else if (request.isStopOperation()) {
handleStopOperation(request, listener);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to stash context before accessing job index (system index)? Please add security tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'm using stashed context when accessing job index, will add security tests

.sort("generated_at", SortOrder.DESC)
);

client.search(searchRequest, ActionListener.wrap(searchResponse -> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to add backend role filtering before search? Please add security tests with backend role filtering on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to do tenant-isolated search, but not necessarily backend role filtering here. For insights generation, it's a background job, so followed existing pattern to use InjectSecurity directly for background work, just impersonate the stored user via InjectSecurity, then execute search directly. For user-facing search APIs like search anomaly result transport action, AD reads the current user from thread context and then adds backend role filtering.

Adding security tests in the next revision


private static final Logger log = LogManager.getLogger(InsightsJobProcessor.class);

private static InsightsJobProcessor INSTANCE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, we should do the same for ADJobProcessor and ForecastJobProcessor too

try {
injectSecurity.inject(user, roles);

localClient
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should verify if mapping is changed by customer before writing. If yes, report error/stop job and stop writing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, updated in the new revision

Instant.now(),
lockDurationSeconds,
user,
ADCommonName.INSIGHTS_RESULT_INDEX_ALIAS,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this index in ADIndex? This would be consistent with other indexes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already added, check file in diff - src/main/java/org/opensearch/ad/indices/ADIndex.java

Copy link
Collaborator

@kaituo kaituo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review

private void createNewJob(String frequency, User user, ActionListener<InsightsJobResponse> listener) {
try {
IntervalSchedule schedule = createSchedule(frequency);
long lockDurationSeconds = java.time.Duration.of(schedule.getInterval(), schedule.getUnit()).getSeconds() * 2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lockDurationSeconds should be less than or equal to frequency, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes good catch, updated it to

            long lockDurationSeconds = java.time.Duration.of(schedule.getInterval(), schedule.getUnit()).getSeconds();

Instant.now(),
null,
Instant.now(),
existingJob.getLockDurationSeconds(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you update frequency, you need to update lock duration as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, updated in the new revision

null,
Instant.now(),
existingJob.getLockDurationSeconds(),
user != null ? user : existingJob.getUser(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allow overwrite user, it would enable malicious user to update a job with. new user and thus cause original user to lose access.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, updated it to preserve the original job user and only fall back to the current request user if the job has no user


log.info("Running Insights job for time window: {} to {}", executionStartTime, executionEndTime);

querySystemResultIndex(jobParameter, lockService, lock, executionStartTime, executionEndTime);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only querying system result index would hinder your ability to go GA alone. You have to tie insights with Auto AD creation. One route to go to GA is to add a text box in AD overview page. That would add a summary on top of existing detectors' results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.x infra Changes to infrastructure, testing, CI/CD, pipelines, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants