-
Couldn't load subscription status.
- Fork 621
Add documentation for rule-based anomaly detection and imputation #8202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
942c7d7
b2af679
e2c656e
fe79e71
dcbce5a
23bcea3
2754b3b
1a5120d
6c3326d
614b660
3ab815f
596adfa
bee1f4c
f8ee3d9
28a6b77
c318ece
4189083
199fbc3
595b45a
66c48c4
d6913fb
45443b9
cfc3709
8a3b25d
bc9488a
50eff8b
2c2e06c
894efee
5738739
14dc454
4ad9e02
4d7f738
9afca30
0067b5d
a99969b
7ea3d63
4b42bc2
f9434ec
ca49c0c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -10,30 +10,36 @@ redirect_from: | |||||
|
|
||||||
| # Anomaly detection | ||||||
|
|
||||||
| An anomaly in OpenSearch is any unusual behavior change in your time-series data. Anomalies can provide valuable insights into your data. For example, for IT infrastructure data, an anomaly in the memory usage metric might help you uncover early signs of a system failure. | ||||||
| An _anomaly_ in OpenSearch is any unusual behavior change in your time-series data. Anomalies can provide valuable insights into your data. For example, for IT infrastructure data, an anomaly in the memory usage metric might help you identify early signs of a system failure. | ||||||
|
|
||||||
| It can be challenging to discover anomalies using conventional methods such as creating visualizations and dashboards. You could configure an alert based on a static threshold, but this requires prior domain knowledge and isn't adaptive to data that exhibits organic growth or seasonal behavior. | ||||||
| It can be challenging to discover anomalies using conventional methods such as creating visualizations and dashboards. You could configure an alert based on a static threshold, but this requires prior domain knowledge and is not adaptive to data that exhibits organic growth or seasonal behavior. | ||||||
|
|
||||||
| Anomaly detection automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://www.semanticscholar.org/paper/Robust-Random-Cut-Forest-Based-Anomaly-Detection-on-Guha-Mishra/ecb365ef9b67cd5540cc4c53035a6a7bd88678f9). | ||||||
|
|
||||||
| You can pair the Anomaly Detection plugin with the [Alerting plugin]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/) to notify you as soon as an anomaly is detected. | ||||||
|
|
||||||
| To get started, choose **Anomaly Detection** in OpenSearch Dashboards. | ||||||
| To first test with sample streaming data, you can try out one of the preconfigured detectors with one of the sample datasets. | ||||||
| ## Using OpenSearch Dashboards anomaly detection | ||||||
|
|
||||||
| To get started, go to **OpenSearch Dashboards** > **OpenSearch Plugins** > **Anomaly Detection**. OpenSearch Dashboards contains sample datasets. You can use these datasets with their preconfigured detectors to try out the feature. | ||||||
|
|
||||||
| The following tutorial guides you through using anomaly detection with your OpenSearch data. | ||||||
|
|
||||||
| ## Step 1: Define a detector | ||||||
|
|
||||||
| A detector is an individual anomaly detection task. You can define multiple detectors, and all the detectors can run simultaneously, with each analyzing data from different sources. | ||||||
| A _detector_ is an individual anomaly detection task. You can define multiple detectors. All the detectors can run simultaneously, with each analyzing data from different sources. | ||||||
vagimeli marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| 1. Choose **Create detector**. | ||||||
| 1. Add in the detector details. | ||||||
| - Enter a name and brief description. Make sure the name is unique and descriptive enough to help you to identify the purpose of the detector. | ||||||
| 1. Specify the data source. | ||||||
| 1. Add the detector details. | ||||||
| - Enter a name and brief description. Make sure the name is unique and descriptive enough to help you to identify the detector's purpose. | ||||||
vagimeli marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
| 1. Specify the data source. | ||||||
| - For **Data source**, choose the index you want to use as the data source. You can optionally use index patterns to choose multiple indexes. | ||||||
| - (Optional) For **Data filter**, filter the index you chose as the data source. From the **Data filter** menu, choose **Add data filter**, and then design your filter query by selecting **Field**, **Operator**, and **Value**, or choose **Use query DSL** and add your own JSON filter query. Only [Boolean queries]({{site.url}}{{site.baseurl}}/query-dsl/compound/bool/) are supported for query domain-specific language (DSL). | ||||||
|
|
||||||
| #### Example filter using query DSL | ||||||
| The query is designed to retrieve documents in which the `urlPath.keyword` field matches one of the following specified values: | ||||||
| --- | ||||||
|
|
||||||
| #### Example: Filter using query DSL | ||||||
|
|
||||||
| The following example query retrieves documents where the `urlPath.keyword` field matches any of the specified values: | ||||||
|
||||||
| The following example query retrieves documents where the `urlPath.keyword` field matches any of the specified values: | |
| The following example query retrieves documents in which the `urlPath.keyword` field matches any of the specified values: |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - To avoid missing any data, set the **Window delay** to the upper limit of the expected ingestion delay. This ensures the detector captures all data during its interval, reducing the risk of missing relevant information. While a longer window delay helps capture all data, setting it too high can hinder real-time anomaly detection, as the detector will look further back in time. Find a balance to maintain both data accuracy and timely detection. | |
| - To avoid missing any data, set the **Window delay** to the upper limit of the expected ingestion delay. This ensures that the detector captures all data during its interval, reducing the risk of missing relevant information. While a longer window delay helps capture all data, too long of a window delay can hinder real-time anomaly detection because the detector will look further back in time. Find a balance to maintain both data accuracy and timely detection. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| A _feature_ is the field in your index that you want to analyze for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly. | |
| A _feature_ is any field in your index that you want to analyze for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly. |
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
vagimeli marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
vagimeli marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
vagimeli marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
vagimeli marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
vagimeli marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
vagimeli marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| For acceptable JSON query syntax, see [OpenSearch Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) | |
| For acceptable JSON query syntax, see [OpenSearch Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/). |
vagimeli marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
vagimeli marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
vagimeli marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| To suppress anomalies for deviations smaller than 30% from the expected value, you can set the following rules: | |
| To suppress anomalies for deviations of less than 30% from the expected value, you can set the following rules: |
vagimeli marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,9 +9,9 @@ redirect_from: | |
|
|
||
| # Anomaly result mapping | ||
|
|
||
| If you enabled custom result index, the anomaly detection plugin stores the results in your own index. | ||
| If you enabled custom result index, the Anomaly Detection plugin stores the results in your own index. | ||
vagimeli marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| If the anomaly detector doesn’t detect an anomaly, the result has the following format: | ||
| If the anomaly detector does not detect an anomaly, the result has the following format: | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Above: "results index" (both instances)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| ```json | ||
| { | ||
|
|
@@ -80,6 +80,81 @@ Field | Description | |
| `model_id` | A unique ID that identifies a model. If a detector is a single-stream detector (with no category field), it has only one model. If a detector is a high-cardinality detector (with one or more category fields), it might have multiple models, one for each entity. | ||
| `threshold` | One of the criteria for a detector to classify a data point as an anomaly is that its `anomaly_score` must surpass a dynamic threshold. This field records the current threshold. | ||
|
|
||
| When the imputation option is enabled, the anomaly result output includes a `feature_imputed` array, showing which features have been imputed. This information helps you identify which features were modified during the anomaly detection process due to missing data. If no features were imputed, then the `feature_imputed` array is excluded from the results. | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Above: "result includes" => "results include"? |
||
| In this example, the feature `processing_bytes_max` was imputed, as indicated by the `imputed: true` status: | ||
|
|
||
| ```json | ||
| { | ||
| "detector_id": "kzcZ43wBgEQAbjDnhzGF", | ||
| "schema_version": 5, | ||
| "data_start_time": 1635898161367, | ||
| "data_end_time": 1635898221367, | ||
| "feature_data": [ | ||
| { | ||
| "feature_id": "processing_bytes_max", | ||
| "feature_name": "processing bytes max", | ||
| "data": 2322 | ||
| }, | ||
| { | ||
| "feature_id": "processing_bytes_avg", | ||
| "feature_name": "processing bytes avg", | ||
| "data": 1718.6666666666667 | ||
| }, | ||
| { | ||
| "feature_id": "processing_bytes_min", | ||
| "feature_name": "processing bytes min", | ||
| "data": 1375 | ||
| }, | ||
| { | ||
| "feature_id": "processing_bytes_sum", | ||
| "feature_name": "processing bytes sum", | ||
| "data": 5156 | ||
| }, | ||
| { | ||
| "feature_id": "processing_time_max", | ||
| "feature_name": "processing time max", | ||
| "data": 31198 | ||
| } | ||
| ], | ||
| "execution_start_time": 1635898231577, | ||
| "execution_end_time": 1635898231622, | ||
| "anomaly_score": 1.8124904404395776, | ||
| "anomaly_grade": 0, | ||
| "confidence": 0.9802940756605277, | ||
| "entity": [ | ||
| { | ||
| "name": "process_name", | ||
| "value": "process_3" | ||
| } | ||
| ], | ||
| "model_id": "kzcZ43wBgEQAbjDnhzGF_entity_process_3", | ||
| "threshold": 1.2368549346675202, | ||
| "feature_imputed": [ | ||
| { | ||
| "feature_id": "processing_bytes_max", | ||
| "imputed": true | ||
| }, | ||
| { | ||
| "feature_id": "processing_bytes_avg", | ||
| "imputed": false | ||
| }, | ||
| { | ||
| "feature_id": "processing_bytes_min", | ||
| "imputed": false | ||
| }, | ||
| { | ||
| "feature_id": "processing_bytes_sum", | ||
| "imputed": false | ||
| }, | ||
| { | ||
| "feature_id": "processing_time_max", | ||
| "imputed": false | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| If an anomaly detector detects an anomaly, the result has the following format: | ||
|
|
||
| ```json | ||
|
|
||

Uh oh!
There was an error while loading. Please reload this page.