Skip to content

Commit bf8024b

Browse files
lcawldroberts195
andauthored
[DOCS] Edits machine learning settings (#69947) (#70174)
Co-authored-by: David Roberts <[email protected]>
1 parent 77406ac commit bf8024b

File tree

7 files changed

+110
-93
lines changed

7 files changed

+110
-93
lines changed

docs/reference/ml/anomaly-detection/apis/put-job.asciidoc

+1-1
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,7 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=categorization-examples-limit]
224224
225225
`model_memory_limit`:::
226226
(long or string)
227-
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-memory-limit]
227+
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-memory-limit-ad]
228228
====
229229
//End analysis_limits
230230

docs/reference/ml/anomaly-detection/apis/update-job.asciidoc

+1-1
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=analysis-limits]
5353
====
5454
`model_memory_limit`:::
5555
(long or string)
56-
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-memory-limit]
56+
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-memory-limit-ad]
5757
+
5858
--
5959
NOTE: You can update the `analysis_limits` only while the job is closed. The

docs/reference/ml/df-analytics/apis/get-dfanalytics.asciidoc

+1-1
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ to `ml`.
127127
(string) The unique identifier of the {dfanalytics-job}.
128128
129129
`model_memory_limit`:::
130-
(string) The `model_memory_limit` that has been set to the {dfanalytics-job}.
130+
(string) The `model_memory_limit` that has been set for the {dfanalytics-job}.
131131
132132
`source`:::
133133
(object) The configuration of how the analysis data is sourced. It has an

docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

+1-6
Original file line numberDiff line numberDiff line change
@@ -511,12 +511,7 @@ functionality other than the analysis itself.
511511

512512
`model_memory_limit`::
513513
(Optional, string)
514-
The approximate maximum amount of memory resources that are permitted for
515-
analytical processing. The default value for {dfanalytics-jobs} is `1gb`. If
516-
your `elasticsearch.yml` file contains an `xpack.ml.max_model_memory_limit`
517-
setting, an error occurs when you try to create {dfanalytics-jobs} that have
518-
`model_memory_limit` values greater than that setting. For more information, see
519-
<<ml-settings>>.
514+
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-memory-limit-dfa]
520515

521516
`source`::
522517
(object)

docs/reference/ml/df-analytics/apis/update-dfanalytics.asciidoc

+1-6
Original file line numberDiff line numberDiff line change
@@ -78,12 +78,7 @@ functionality other than the analysis itself.
7878

7979
`model_memory_limit`::
8080
(Optional, string)
81-
The approximate maximum amount of memory resources that are permitted for
82-
analytical processing. The default value for {dfanalytics-jobs} is `1gb`. If
83-
your `elasticsearch.yml` file contains an `xpack.ml.max_model_memory_limit`
84-
setting, an error occurs when you try to create {dfanalytics-jobs} that have
85-
`model_memory_limit` values greater than that setting. For more information, see
86-
<<ml-settings>>.
81+
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-memory-limit-dfa]
8782

8883
[[ml-update-dfanalytics-example]]
8984
== {api-examples-title}

docs/reference/ml/ml-shared.asciidoc

+20-11
Original file line numberDiff line numberDiff line change
@@ -1153,15 +1153,16 @@ tag::model-id-or-alias[]
11531153
The unique identifier of the trained model or a model alias.
11541154
end::model-id-or-alias[]
11551155

1156-
tag::model-memory-limit[]
1156+
tag::model-memory-limit-ad[]
11571157
The approximate maximum amount of memory resources that are required for
11581158
analytical processing. Once this limit is approached, data pruning becomes
11591159
more aggressive. Upon exceeding this limit, new entities are not modeled. The
1160-
default value for jobs created in version 6.1 and later is `1024mb`.
1161-
This value will need to be increased for jobs that are expected to analyze high
1162-
cardinality fields, but the default is set to a relatively small size to ensure
1163-
that high resource usage is a conscious decision. The default value for jobs
1164-
created in versions earlier than 6.1 is `4096mb`.
1160+
default value for jobs created in version 6.1 and later is `1024mb`. If the
1161+
`xpack.ml.max_model_memory_limit` setting has a value greater than `0` and less
1162+
than `1024mb`, however, that value is used instead. The default value is
1163+
relatively small to ensure that high resource usage is a conscious decision. If
1164+
you have jobs that are expected to analyze high cardinality fields, you will
1165+
likely need to use a higher value.
11651166
+
11661167
If you specify a number instead of a string, the units are assumed to be MiB.
11671168
Specifying a string is recommended for clarity. If you specify a byte size unit
@@ -1170,16 +1171,24 @@ it is rounded down to the closest MiB. The minimum valid value is 1 MiB. If you
11701171
specify a value less than 1 MiB, an error occurs. For more information about
11711172
supported byte size units, see <<byte-units>>.
11721173
+
1173-
If your `elasticsearch.yml` file contains an `xpack.ml.max_model_memory_limit`
1174-
setting, an error occurs when you try to create jobs that have
1175-
`model_memory_limit` values greater than that setting. For more information,
1176-
see <<ml-settings>>.
1177-
end::model-memory-limit[]
1174+
If you specify a value for the `xpack.ml.max_model_memory_limit` setting, an
1175+
error occurs when you try to create jobs that have `model_memory_limit` values
1176+
greater than that setting value. For more information, see <<ml-settings>>.
1177+
end::model-memory-limit-ad[]
11781178

11791179
tag::model-memory-limit-anomaly-jobs[]
11801180
The upper limit for model memory usage, checked on increasing values.
11811181
end::model-memory-limit-anomaly-jobs[]
11821182

1183+
tag::model-memory-limit-dfa[]
1184+
The approximate maximum amount of memory resources that are permitted for
1185+
analytical processing. The default value for {dfanalytics-jobs} is `1gb`. If
1186+
you specify a value for the `xpack.ml.max_model_memory_limit` setting, an error
1187+
occurs when you try to create jobs that have `model_memory_limit` values greater
1188+
than that setting value. For more information, see
1189+
<<ml-settings>>.
1190+
end::model-memory-limit-dfa[]
1191+
11831192
tag::model-memory-status[]
11841193
The status of the mathematical models, which can have one of the following
11851194
values:

docs/reference/settings/ml-settings.asciidoc

+85-67
Original file line numberDiff line numberDiff line change
@@ -56,18 +56,20 @@ coordinating nodes.
5656
(<<static-cluster-setting,Static>>) The maximum inference cache size allowed.
5757
The inference cache exists in the JVM heap on each ingest node. The cache
5858
affords faster processing times for the `inference` processor. The value can be
59-
a static byte sized value (i.e. "2gb") or a percentage of total allocated heap.
60-
The default is "40%". See also <<model-inference-circuit-breaker>>.
59+
a static byte sized value (such as `2gb`) or a percentage of total allocated
60+
heap. Defaults to `40%`. See also <<model-inference-circuit-breaker>>.
6161

6262
[[xpack-interference-model-ttl]]
6363
// tag::interference-model-ttl-tag[]
6464
`xpack.ml.inference_model.time_to_live` {ess-icon}::
65-
(<<static-cluster-setting,Static>>) The time to live (TTL) for models in the
66-
inference model cache. The TTL is calculated from last access. The `inference`
67-
processor attempts to load the model from cache. If the `inference` processor
68-
does not receive any documents for the duration of the TTL, the referenced model
69-
is flagged for eviction from the cache. If a document is processed later, the
70-
model is again loaded into the cache. Defaults to `5m`.
65+
(<<static-cluster-setting,Static>>) The time to live (TTL) for trained models in
66+
the inference model cache. The TTL is calculated from last access. Users of the
67+
cache (such as the inference processor or inference aggregator) cache a model on
68+
its first use and reset the TTL on every use. If a cached model is not accessed
69+
for the duration of the TTL, it is flagged for eviction from the cache. If a
70+
document is processed later, the model is again loaded into the cache. To update
71+
this setting in {ess}, see
72+
{cloud}/ec-add-user-settings.html[Add {es} user settings]. Defaults to `5m`.
7173
// end::interference-model-ttl-tag[]
7274

7375
`xpack.ml.max_inference_processors`::
@@ -77,40 +79,54 @@ adding an `inference` processor to a pipeline is disallowed. Defaults to `50`.
7779

7880
`xpack.ml.max_machine_memory_percent`::
7981
(<<cluster-update-settings,Dynamic>>) The maximum percentage of the machine's
80-
memory that {ml} may use for running analytics processes. (These processes are
81-
separate to the {es} JVM.) Defaults to `30` percent. The limit is based on the
82-
total memory of the machine, not current free memory. Jobs are not allocated to
83-
a node if doing so would cause the estimated memory use of {ml} jobs to exceed
84-
the limit. When the {operator-feature} is enabled, this setting can be updated
85-
only by operator users.
82+
memory that {ml} may use for running analytics processes. These processes are
83+
separate to the {es} JVM. The limit is based on the total memory of the machine,
84+
not current free memory. Jobs are not allocated to a node if doing so would
85+
cause the estimated memory use of {ml} jobs to exceed the limit. When the
86+
{operator-feature} is enabled, this setting can be updated only by operator
87+
users. The minimum value is `5`; the maximum value is `200`. Defaults to `30`.
88+
+
89+
--
90+
TIP: Do not configure this setting to a value higher than the amount of memory
91+
left over after running the {es} JVM unless you have enough swap space to
92+
accommodate it and have determined this is an appropriate configuration for a
93+
specialist use case. The maximum setting value is for the special case where it
94+
has been determined that using swap space for {ml} jobs is acceptable. The
95+
general best practice is to not use swap on {es} nodes.
96+
97+
--
8698

8799
`xpack.ml.max_model_memory_limit`::
88100
(<<cluster-update-settings,Dynamic>>) The maximum `model_memory_limit` property
89-
value that can be set for any job on this node. If you try to create a job with
90-
a `model_memory_limit` property value that is greater than this setting value,
91-
an error occurs. Existing jobs are not affected when you update this setting.
92-
For more information about the `model_memory_limit` property, see
93-
<<put-analysislimits>>.
101+
value that can be set for any {ml} jobs in this cluster. If you try to create a
102+
job with a `model_memory_limit` property value that is greater than this setting
103+
value, an error occurs. Existing jobs are not affected when you update this
104+
setting. If this setting is `0` or unset, there is no maximum
105+
`model_memory_limit` value. If there are no nodes that meet the memory
106+
requirements for a job, this lack of a maximum memory limit means it's possible
107+
to create jobs that cannot be assigned to any available nodes. For more
108+
information about the `model_memory_limit` property, see
109+
<<ml-put-job,Create {anomaly-jobs}>> or <<put-dfanalytics>>. Defaults to `0`.
94110

95111
[[xpack.ml.max_open_jobs]]
96112
`xpack.ml.max_open_jobs`::
97113
(<<cluster-update-settings,Dynamic>>) The maximum number of jobs that can run
98-
simultaneously on a node. Defaults to `20`. In this context, jobs include both
99-
{anomaly-jobs} and {dfanalytics-jobs}. The maximum number of jobs is also
100-
constrained by memory usage. Thus if the estimated memory usage of the jobs
101-
would be higher than allowed, fewer jobs will run on a node. Prior to version
102-
7.1, this setting was a per-node non-dynamic setting. It became a cluster-wide
103-
dynamic setting in version 7.1. As a result, changes to its value after node
104-
startup are used only after every node in the cluster is running version 7.1 or
105-
higher. The maximum permitted value is `512`.
114+
simultaneously on a node. In this context, jobs include both {anomaly-jobs} and
115+
{dfanalytics-jobs}. The maximum number of jobs is also constrained by memory
116+
usage. Thus if the estimated memory usage of the jobs would be higher than
117+
allowed, fewer jobs will run on a node. Prior to version 7.1, this setting was a
118+
per-node non-dynamic setting. It became a cluster-wide dynamic setting in
119+
version 7.1. As a result, changes to its value after node startup are used only
120+
after every node in the cluster is running version 7.1 or higher. The minimum
121+
value is `1`; the maximum value is `512`. Defaults to `20`.
106122

107123
`xpack.ml.nightly_maintenance_requests_per_second`::
108124
(<<cluster-update-settings,Dynamic>>) The rate at which the nightly maintenance
109125
task deletes expired model snapshots and results. The setting is a proxy to the
110-
<<docs-delete-by-query-throttle,requests_per_second>> parameter used in the
126+
<<docs-delete-by-query-throttle,`requests_per_second`>> parameter used in the
111127
delete by query requests and controls throttling. When the {operator-feature} is
112128
enabled, this setting can be updated only by operator users. Valid values must
113-
be greater than `0.0` or equal to `-1.0` where `-1.0` means a default value is
129+
be greater than `0.0` or equal to `-1.0`, where `-1.0` means a default value is
114130
used. Defaults to `-1.0`
115131

116132
`xpack.ml.node_concurrent_job_allocations`::
@@ -134,19 +150,19 @@ enabled, this setting can be updated only by operator users.
134150

135151
`xpack.ml.max_anomaly_records`::
136152
(<<cluster-update-settings,Dynamic>>) The maximum number of records that are
137-
output per bucket. The default value is `500`.
153+
output per bucket. Defaults to `500`.
138154

139155
`xpack.ml.max_lazy_ml_nodes`::
140156
(<<cluster-update-settings,Dynamic>>) The number of lazily spun up {ml} nodes.
141157
Useful in situations where {ml} nodes are not desired until the first {ml} job
142-
opens. It defaults to `0` and has a maximum acceptable value of `3`. If the
143-
current number of {ml} nodes is greater than or equal to this setting, it is
144-
assumed that there are no more lazy nodes available as the desired number
145-
of nodes have already been provisioned. If a job is opened and this setting has
146-
a value greater than zero and there are no nodes that can accept the job, the
147-
job stays in the `OPENING` state until a new {ml} node is added to the cluster
148-
and the job is assigned to run on that node. When the {operator-feature} is
149-
enabled, this setting can be updated only by operator users.
158+
opens. If the current number of {ml} nodes is greater than or equal to this
159+
setting, it is assumed that there are no more lazy nodes available as the
160+
desired number of nodes have already been provisioned. If a job is opened and
161+
this setting has a value greater than zero and there are no nodes that can
162+
accept the job, the job stays in the `OPENING` state until a new {ml} node is
163+
added to the cluster and the job is assigned to run on that node. When the
164+
{operator-feature} is enabled, this setting can be updated only by operator
165+
users. Defaults to `0`.
150166
+
151167
IMPORTANT: This setting assumes some external process is capable of adding {ml}
152168
nodes to the cluster. This setting is only useful when used in conjunction with
@@ -155,65 +171,67 @@ such an external process.
155171
`xpack.ml.max_ml_node_size`::
156172
(<<cluster-update-settings,Dynamic>>)
157173
The maximum node size for {ml} nodes in a deployment that supports automatic
158-
cluster scaling. Defaults to `0b`, which means this value is ignored. If you set
159-
it to the maximum possible size of future {ml} nodes, when a {ml} job is
160-
assigned to a lazy node it can check (and fail quickly) when scaling cannot
161-
support the size of the job. When the {operator-feature} is enabled, this
162-
setting can be updated only by operator users.
174+
cluster scaling. If you set it to the maximum possible size of future {ml} nodes,
175+
when a {ml} job is assigned to a lazy node it can check (and fail quickly) when
176+
scaling cannot support the size of the job. When the {operator-feature} is
177+
enabled, this setting can be updated only by operator users. Defaults to `0b`,
178+
which means it will be assumed that automatic cluster scaling can add arbitrarily large nodes to the cluster.
163179

164180
`xpack.ml.persist_results_max_retries`::
165181
(<<cluster-update-settings,Dynamic>>) The maximum number of times to retry bulk
166182
indexing requests that fail while processing {ml} results. If the limit is
167183
reached, the {ml} job stops processing data and its status is `failed`. When the
168184
{operator-feature} is enabled, this setting can be updated only by operator
169-
users. Defaults to `20`. The maximum value for this setting is `50`.
185+
users. The minimum value is `0`; the maximum value is `50`. Defaults to `20`.
170186

171187
`xpack.ml.process_connect_timeout`::
172188
(<<cluster-update-settings,Dynamic>>) The connection timeout for {ml} processes
173-
that run separately from the {es} JVM. Defaults to `10s`. Some {ml} processing
174-
is done by processes that run separately to the {es} JVM. When such processes
175-
are started they must connect to the {es} JVM. If such a process does not
176-
connect within the time period specified by this setting then the process is
177-
assumed to have failed. When the {operator-feature} is enabled, this setting can
178-
be updated only by operator users. Defaults to `10s`. The minimum value for this
179-
setting is `5s`.
189+
that run separately from the {es} JVM. When such processes are started they must
190+
connect to the {es} JVM. If the process does not connect within the time period
191+
specified by this setting then the process is assumed to have failed. When the
192+
{operator-feature} is enabled, this setting can be updated only by operator
193+
users. The minimum value is `5s`. Defaults to `10s`.
180194

181195
xpack.ml.use_auto_machine_memory_percent::
182196
(<<cluster-update-settings,Dynamic>>) If this setting is `true`, the
183197
`xpack.ml.max_machine_memory_percent` setting is ignored. Instead, the maximum
184198
percentage of the machine's memory that can be used for running {ml} analytics
185199
processes is calculated automatically and takes into account the total node size
186-
and the size of the JVM on the node. The default value is `false`. If this
187-
setting differs between nodes, the value on the current master node is heeded.
188-
When the {operator-feature} is enabled, this setting can be updated only by
189-
operator users.
200+
and the size of the JVM on the node. If this setting differs between nodes, the
201+
value on the current master node is heeded. When the {operator-feature} is
202+
enabled, this setting can be updated only by operator users. The default value
203+
is `false`.
190204
+
191-
TIP: If you do not have dedicated {ml} nodes (that is to say, the node has
205+
--
206+
[IMPORTANT]
207+
====
208+
* If you do not have dedicated {ml} nodes (that is to say, the node has
192209
multiple roles), do not enable this setting. Its calculations assume that {ml}
193210
analytics are the main purpose of the node.
194-
+
195-
IMPORTANT: The calculation assumes that dedicated {ml} nodes have at least
211+
* The calculation assumes that dedicated {ml} nodes have at least
196212
`256MB` memory reserved outside of the JVM. If you have tiny {ml}
197213
nodes in your cluster, you shouldn't use this setting.
214+
====
215+
--
198216

199217
[discrete]
200218
[[model-inference-circuit-breaker]]
201219
==== {ml-cap} circuit breaker settings
202220

203221
`breaker.model_inference.limit`::
204-
(<<cluster-update-settings,Dynamic>>) Limit for the model inference breaker,
205-
which defaults to 50% of the JVM heap. If the parent circuit breaker is less
206-
than 50% of the JVM heap, it is bound to that limit instead. See
207-
<<circuit-breaker>>.
222+
(<<cluster-update-settings,Dynamic>>) The limit for the trained model circuit
223+
breaker. This value is defined as a percentage of the JVM heap. Defaults to
224+
`50%`. If the <<parent-circuit-breaker,parent circuit breaker>> is set to a
225+
value less than `50%`, this setting uses that value as its default instead.
208226

209227
`breaker.model_inference.overhead`::
210-
(<<cluster-update-settings,Dynamic>>) A constant that all accounting estimations
211-
are multiplied by to determine a final estimation. Defaults to 1. See
212-
<<circuit-breaker>>.
228+
(<<cluster-update-settings,Dynamic>>) A constant that all trained model
229+
estimations are multiplied by to determine a final estimation. See
230+
<<circuit-breaker>>. Defaults to `1`.
213231

214232
`breaker.model_inference.type`::
215233
(<<static-cluster-setting,Static>>) The underlying type of the circuit breaker.
216234
There are two valid options: `noop` and `memory`. `noop` means the circuit
217235
breaker does nothing to prevent too much memory usage. `memory` means the
218-
circuit breaker tracks the memory used by inference models and can potentially
219-
break and prevent `OutOfMemory` errors. The default is `memory`.
236+
circuit breaker tracks the memory used by trained models and can potentially
237+
break and prevent `OutOfMemory` errors. The default value is `memory`.

0 commit comments

Comments
 (0)