opensearch-project · vagimeli · Sep 13, 2024 · Sep 9, 2024 · Sep 10, 2024 · Sep 11, 2024
@@ -10,30 +10,36 @@ redirect_from:
 
 # Anomaly detection
 
-An anomaly in OpenSearch is any unusual behavior change in your time-series data. Anomalies can provide valuable insights into your data. For example, for IT infrastructure data, an anomaly in the memory usage metric might help you uncover early signs of a system failure.
+An _anomaly_ in OpenSearch is any unusual behavior change in your time-series data. Anomalies can provide valuable insights into your data. For example, for IT infrastructure data, an anomaly in the memory usage metric might help you identify early signs of a system failure.
 
-It can be challenging to discover anomalies using conventional methods such as creating visualizations and dashboards. You could configure an alert based on a static threshold, but this requires prior domain knowledge and isn't adaptive to data that exhibits organic growth or seasonal behavior.
+It can be challenging to discover anomalies using conventional methods such as creating visualizations and dashboards. You could configure an alert based on a static threshold, but this requires prior domain knowledge and is not adaptive to data that exhibits organic growth or seasonal behavior.
 
 Anomaly detection automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://www.semanticscholar.org/paper/Robust-Random-Cut-Forest-Based-Anomaly-Detection-on-Guha-Mishra/ecb365ef9b67cd5540cc4c53035a6a7bd88678f9).
 
 You can pair the Anomaly Detection plugin with the [Alerting plugin]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/) to notify you as soon as an anomaly is detected.
 
-To get started, choose **Anomaly Detection** in OpenSearch Dashboards.
-To first test with sample streaming data, you can try out one of the preconfigured detectors with one of the sample datasets.
+## Using OpenSearch Dashboards anomaly detection
+
+To get started, go to **OpenSearch Dashboards** > **OpenSearch Plugins** > **Anomaly Detection**. OpenSearch Dashboards contains sample datasets. You can use these datasets with their preconfigured detectors to try out the feature.
+
+The following tutorial guides you through using anomaly detection with your OpenSearch data.
 
 ## Step 1: Define a detector
 
-A detector is an individual anomaly detection task. You can define multiple detectors, and all the detectors can run simultaneously, with each analyzing data from different sources.
+A _detector_ is an individual anomaly detection task. You can define multiple detectors. All the detectors can run simultaneously, with each analyzing data from different sources.
 
 1. Choose **Create detector**.
-1. Add in the detector details.
-   - Enter a name and brief description. Make sure the name is unique and descriptive enough to help you to identify the purpose of the detector.
-1. Specify the data source.   
+1. Add the detector details.
+   - Enter a name and brief description. Make sure the name is unique and descriptive enough to help you to identify the detector's purpose.
+1. Specify the data source.
    - For **Data source**, choose the index you want to use as the data source. You can optionally use index patterns to choose multiple indexes.
    - (Optional) For **Data filter**, filter the index you chose as the data source. From the **Data filter** menu, choose **Add data filter**, and then design your filter query by selecting **Field**, **Operator**, and **Value**, or choose **Use query DSL** and add your own JSON filter query. Only [Boolean queries]({{site.url}}{{site.baseurl}}/query-dsl/compound/bool/) are supported for query domain-specific language (DSL).
 
-#### Example filter using query DSL
-The query is designed to retrieve documents in which the `urlPath.keyword` field matches one of the following specified values:
+---
+
+#### Example: Filter using query DSL
+
+The following example query retrieves documents where the `urlPath.keyword` field matches any of the specified values:
-The following example query retrieves documents where the `urlPath.keyword` field matches any of the specified values:
+The following example query retrieves documents in which the `urlPath.keyword` field matches any of the specified values:
-The following example query retrieves documents where the `urlPath.keyword` field matches any of the specified values:
+The following example query retrieves documents in which the `urlPath.keyword` field matches any of the specified values:
 
    - /domain/{id}/short
    - /sub_dir/{id}/short
@@ -62,9 +68,12 @@ The query is designed to retrieve documents in which the `urlPath.keyword` field
       }
    }
    ```
+   {% include copy-curl.html %}
+
+---
 
-1. Specify a timestamp.    
-   - Select the **Timestamp field** in your index.
+1. Specify a timestamp.
+   - Select the **Timestamp field** in the index.
 1. Define operation settings.
    - For **Operation settings**, define the **Detector interval**, which is the time interval at which the detector collects data.
       - The detector aggregates the data in this interval, then feeds the aggregated result into the anomaly detection model.
@@ -76,6 +85,8 @@ The query is designed to retrieve documents in which the `urlPath.keyword` field
    - (Optional) To add extra processing time for data collection, specify a **Window delay** value.
       - This value tells the detector that the data is not ingested into OpenSearch in real time but with a certain delay. Set the window delay to shift the detector interval to account for this delay.
       - For example, say the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute. Assume the detector runs at 2:00. The detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00. Setting the window delay to 1 minute shifts the interval window to 1:49--1:59, so the detector accounts for all 10 minutes of the detector interval time.
+      - To avoid missing any data, set the **Window delay** to the upper limit of the expected ingestion delay. This ensures the detector captures all data during its interval, reducing the risk of missing relevant information. While a longer window delay helps capture all data, setting it too high can hinder real-time anomaly detection, as the detector will look further back in time. Find a balance to maintain both data accuracy and timely detection.
-      - To avoid missing any data, set the **Window delay** to the upper limit of the expected ingestion delay. This ensures the detector captures all data during its interval, reducing the risk of missing relevant information. While a longer window delay helps capture all data, setting it too high can hinder real-time anomaly detection, as the detector will look further back in time. Find a balance to maintain both data accuracy and timely detection.
+      - To avoid missing any data, set the **Window delay** to the upper limit of the expected ingestion delay. This ensures that the detector captures all data during its interval, reducing the risk of missing relevant information. While a longer window delay helps capture all data, too long of a window delay can hinder real-time anomaly detection because the detector will look further back in time. Find a balance to maintain both data accuracy and timely detection.
-      - To avoid missing any data, set the **Window delay** to the upper limit of the expected ingestion delay. This ensures the detector captures all data during its interval, reducing the risk of missing relevant information. While a longer window delay helps capture all data, setting it too high can hinder real-time anomaly detection, as the detector will look further back in time. Find a balance to maintain both data accuracy and timely detection.
+      - To avoid missing any data, set the **Window delay** to the upper limit of the expected ingestion delay. This ensures that the detector captures all data during its interval, reducing the risk of missing relevant information. While a longer window delay helps capture all data, too long of a window delay can hinder real-time anomaly detection because the detector will look further back in time. Find a balance to maintain both data accuracy and timely detection.
+
 1. Specify custom results index.
    - The Anomaly Detection plugin allows you to store anomaly detection results in a custom index of your choice. To enable this, select **Enable custom results index** and provide a name for your index, for example, `abc`. The plugin then creates an alias prefixed with `opensearch-ad-plugin-result-` followed by your chosen name, for example, `opensearch-ad-plugin-result-abc`. This alias points to an actual index with a name containing the date and a sequence number, like `opensearch-ad-plugin-result-abc-history-2024.06.12-000002`, where your results are stored.
 
@@ -109,31 +120,32 @@ After you define the detector, the next step is to configure the model.
 
 ## Step 2: Configure the model
 
-#### Add features to your detector
+1. Add features to your detector.
 
-A feature is the field in your index that you want to check for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly.
+A _feature_ is the field in your index that you want to analyze for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly.
-A _feature_ is the field in your index that you want to analyze for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly.
+A _feature_ is any field in your index that you want to analyze for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly.
-A _feature_ is the field in your index that you want to analyze for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly.
+A _feature_ is any field in your index that you want to analyze for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly.
 
 For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature.
 
 A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. Selecting the optimal feature set is usually an iterative process. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `plugins.anomaly_detection.max_anomaly_features` setting.
-{: .note }
+{: .note}
 
 To configure an anomaly detection model based on an aggregation method, follow these steps:
 
-1. On the **Configure Model** page, enter the **Feature name** and check **Enable feature**.
+1. On the **Configure model** page, enter the **Feature name** and select the **Enable feature** checkbox.
 1. For **Find anomalies based on**, select **Field Value**.
 1. For **aggregation method**, select either **average()**, **count()**, **sum()**, **min()**, or **max()**.
 1. For **Field**, select from the available options.
 
 To configure an anomaly detection model based on a JSON aggregation query, follow these steps:
-1. On the **Configure Model** page, enter the **Feature name** and check **Enable feature**.
-1. For **Find anomalies based on**, select **Custom expression**. You will see the JSON editor window open up.
+
+1. On the **Configure Model** page, enter the **Feature name** and select the **Enable feature** checkbox.
+1. For **Find anomalies based on**, select **Custom expression**. The JSON editor window will open.
 1. Enter your JSON aggregation query in the editor.
 
 For acceptable JSON query syntax, see [OpenSearch Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/)
-For acceptable JSON query syntax, see [OpenSearch Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/)
+For acceptable JSON query syntax, see [OpenSearch Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/).
-For acceptable JSON query syntax, see [OpenSearch Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/)
+For acceptable JSON query syntax, see [OpenSearch Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/).
-{: .note }
+{: .note}
 
-#### (Optional) Set category fields for high cardinality
+### (Optional) Set category fields for high cardinality
 
 You can categorize anomalies based on a keyword or IP field type.
 
@@ -160,13 +172,52 @@ If the actual total number of unique entities is higher than the number that you
 This formula serves as a starting point. Make sure to test it with a representative workload. You can find more information in the [Improving Anomaly Detection: One million entities in one minute](https://opensearch.org/blog/one-million-enitities-in-one-minute/) blog post.
 {: .note }
 
-#### (Advanced settings) Set a shingle size
+### (Advanced settings) Set a shingle size
 
 Set the number of aggregation intervals from your data stream to consider in a detection window. It’s best to choose this value based on your actual data to see which one leads to the best results for your use case.
 
-The anomaly detector expects the shingle size to be in the range of 1 and 60. The default shingle size is 8. We recommend that you don't choose 1 unless you have two or more features. Smaller values might increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall) but also false positives. Larger values might be useful for ignoring noise in a signal.
+The anomaly detector expects the shingle size to be in the range of 1 and 128. The default shingle size is `8`. Choose `1` only if you have two or more features. Smaller values might increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall) but also increase false positives. Larger values might be useful for ignoring noise in a signal.
+
+### (Advanced settings) Set an imputation option
+
+The imputation option allows you to address missing data in your streams. You can choose from the following methods to handle gaps:
+
+- **Ignore Missing Data (Default):** The system continues without considering missing data points, keeping the existing data flow.
+- **Fill with Custom Values:** Specify a custom value for each feature to replace missing data points, allowing for targeted imputation tailored to your data.
+- **Fill with Zeros:** Replace missing values with zeros. This is ideal when the absence of data indicates a significant event, such as a drop to zero in event counts.
+- **Use Previous Values:** Fill gaps with the last observed value to maintain continuity in your time-series data. This method treats missing data as non-anomalous, carrying forward the previous trend.
+
+Using these options can improve recall in anomaly detection. For instance, if you are monitoring for drops in event counts, including both partial and complete drops, filling missing values with zeros helps detect significant data absences, improving detection recall.
+
+Be cautious when imputing extensively missing data, as excessive gaps can compromise model accuracy. Quality input is critical---poor data quality leads to poor model performance. You can check whether a feature value has been imputed using the `feature_imputed` field in the anomaly result index. See [Anomaly result mapping]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/result-mapping) for more information.
+{: note}
+
+### (Advanced settings) Suppressing anomalies with threshold-based rules
+
+You can suppress anomalies by setting rules that define acceptable differences between the expected and actual values, either as an absolute value or a relative percentage. This helps reduce false anomalies caused by minor fluctuations, allowing you to focus on significant deviations.
+
+Suppose you want to detect substantial changes in log volume while ignoring small variations that are not meaningful. Without customized settings, the system might generate false alerts for minor changes, making it difficult to identify true anomalies. By setting suppression rules, you can ignore minor deviations and focus on real anomalous patterns.
+
+To suppress anomalies for deviations smaller than 30% from the expected value, you can set the following rules:
-To suppress anomalies for deviations smaller than 30% from the expected value, you can set the following rules:
+To suppress anomalies for deviations of less than 30% from the expected value, you can set the following rules:
-To suppress anomalies for deviations smaller than 30% from the expected value, you can set the following rules:
+To suppress anomalies for deviations of less than 30% from the expected value, you can set the following rules:
+
+```
+Ignore anomalies for feature logVolume when the actual value is no more than 30% above the expected value.
+Ignore anomalies for feature logVolume when the actual value is no more than 30% below the expected value.
+```
+
+Ensure that a feature, for example, `logVolume`, is properly defined in your model. Suppression rules are tied to specific features.
+{: .note}
+
+If you expect that the log volume should differ by at least 10,000 from the expected value before being considered an anomaly, you can set absolute thresholds:
+
+```
+Ignore anomalies for feature logVolume when the actual value is no more than 10000 above the expected value.
+Ignore anomalies for feature logVolume when the actual value is no more than 10000 below the expected value.
+```
+
+If no custom suppression rules are set, then the system defaults to a filter that ignores anomalies with deviations of less than 20% from the expected value for each enabled feature.
 
-#### Preview sample anomalies
+### Preview sample anomalies
 
 Preview sample anomalies and adjust the feature settings if needed.
 For sample previews, the Anomaly Detection plugin selects a small number of data samples---for example, one data point every 30 minutes---and uses interpolation to estimate the remaining data points to approximate the actual feature data. It loads this sample dataset into the detector. The detector uses this sample dataset to generate a sample preview of anomaly results.

@@ -9,9 +9,9 @@ redirect_from:
 
 # Anomaly result mapping
 
-If you enabled custom result index, the anomaly detection plugin stores the results in your own index.
+If you enabled custom result index, the Anomaly Detection plugin stores the results in your own index.
 
-If the anomaly detector doesn’t detect an anomaly, the result has the following format:
+If the anomaly detector does not detect an anomaly, the result has the following format:
 
 ```json
 {
@@ -80,6 +80,81 @@ Field | Description
 `model_id` | A unique ID that identifies a model. If a detector is a single-stream detector (with no category field), it has only one model. If a detector is a high-cardinality detector (with one or more category fields), it might have multiple models, one for each entity.
 `threshold` | One of the criteria for a detector to classify a data point as an anomaly is that its `anomaly_score` must surpass a dynamic threshold. This field records the current threshold.
 
+When the imputation option is enabled, the anomaly result output includes a `feature_imputed` array, showing which features have been imputed. This information helps you identify which features were modified during the anomaly detection process due to missing data. If no features were imputed, then the `feature_imputed` array is excluded from the results.
+
+In this example, the feature `processing_bytes_max` was imputed, as indicated by the `imputed: true` status:
+
+```json
+{
+    "detector_id": "kzcZ43wBgEQAbjDnhzGF",
+    "schema_version": 5,
+    "data_start_time": 1635898161367,
+    "data_end_time": 1635898221367,
+    "feature_data": [
+        {
+            "feature_id": "processing_bytes_max",
+            "feature_name": "processing bytes max",
+            "data": 2322
+        },
+        {
+            "feature_id": "processing_bytes_avg",
+            "feature_name": "processing bytes avg",
+            "data": 1718.6666666666667
+        },
+        {
+            "feature_id": "processing_bytes_min",
+            "feature_name": "processing bytes min",
+            "data": 1375
+        },
+        {
+            "feature_id": "processing_bytes_sum",
+            "feature_name": "processing bytes sum",
+            "data": 5156
+        },
+        {
+            "feature_id": "processing_time_max",
+            "feature_name": "processing time max",
+            "data": 31198
+        }
+    ],
+    "execution_start_time": 1635898231577,
+    "execution_end_time": 1635898231622,
+    "anomaly_score": 1.8124904404395776,
+    "anomaly_grade": 0,
+    "confidence": 0.9802940756605277,
+    "entity": [
+        {
+            "name": "process_name",
+            "value": "process_3"
+        }
+    ],
+    "model_id": "kzcZ43wBgEQAbjDnhzGF_entity_process_3",
+    "threshold": 1.2368549346675202,
+    "feature_imputed": [
+        {
+            "feature_id": "processing_bytes_max",
+            "imputed": true
+        },
+        {
+            "feature_id": "processing_bytes_avg",
+            "imputed": false
+        },
+        {
+            "feature_id": "processing_bytes_min",
+            "imputed": false
+        },
+        {
+            "feature_id": "processing_bytes_sum",
+            "imputed": false
+        },
+        {
+            "feature_id": "processing_time_max",
+            "imputed": false
+        }
+    ]
+}
+```
+
 If an anomaly detector detects an anomaly, the result has the following format:
 
 ```json