Skip to content

Commit 55e2ec6

Browse files
authored
[DOCS] Document delete/update by query for data streams (#58679)
1 parent 04a6781 commit 55e2ec6

5 files changed

+156
-54
lines changed

docs/reference/data-streams/data-streams-overview.asciidoc

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -119,28 +119,30 @@ manually perform a rollover. See <<manually-roll-over-a-data-stream>>.
119119
=== Append-only
120120

121121
For most time-series use cases, existing data is rarely, if ever, updated.
122-
Because of this, data streams are designed to be append-only. This means you can
123-
send indexing requests for new documents directly to a data stream. However, you
124-
cannot send update or deletion requests for existing documents to a data stream.
122+
Because of this, data streams are designed to be append-only.
125123

126-
To update or delete specific documents in a data stream, submit one of the
127-
following requests to the backing index containing the document:
124+
You can send <<add-documents-to-a-data-stream,indexing requests for new
125+
documents>> directly to a data stream. However, you cannot send the following
126+
requests for existing documents directly to a data stream:
128127

129128
* An <<docs-index_,index API>> request with an
130-
<<docs-index-api-op_type,`op_type`>> of `index`.
131-
These requests must include valid <<optimistic-concurrency-control,`if_seq_no`
132-
and `if_primary_term`>> arguments.
129+
<<docs-index-api-op_type,`op_type`>> of `index`. The `op_type` parameter
130+
defaults to `index` for existing documents.
133131

134132
* A <<docs-bulk,bulk API>> request using the `delete`, `index`, or `update`
135-
action. If the action type is `index`, the action must include valid
136-
<<bulk-optimistic-concurrency-control,`if_seq_no` and `if_primary_term`>>
137-
arguments.
133+
action.
138134

139135
* A <<docs-delete,delete API>> request
140136

141-
See <<update-delete-docs-in-a-data-stream>>.
137+
Instead, you can use the <<docs-update-by-query,update by query>> and
138+
<<docs-delete-by-query,delete by query>> APIs to update or delete existing
139+
documents in a data stream. See <<update-delete-docs-in-a-data-stream>>.
140+
141+
Alternatively, you can update or delete a document by submitting requests to the
142+
backing index containing the document. See
143+
<<update-delete-docs-in-a-backing-index>>.
142144

143145
TIP: If you frequently update or delete existing documents,
144146
we recommend using an <<indices-add-alias,index alias>> and
145147
<<indices-templates,index template>> instead of a data stream. You can still
146-
use <<index-lifecycle-management,{ilm-init}>> to manage indices for the alias.
148+
use <<index-lifecycle-management,{ilm-init}>> to manage indices for the alias.

docs/reference/data-streams/set-up-a-data-stream.asciidoc

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,10 @@ TIP: Data streams work well with most common log formats. While no schema is
2626
required to use data streams, we recommend the {ecs-ref}[Elastic Common Schema
2727
(ECS)].
2828

29-
* Data streams are designed to be <<data-streams-append-only,append-only>>.
30-
While you can index new documents directly to a data stream, you cannot use a
31-
data stream to directly update or delete individual documents. To update or
32-
delete specific documents in a data stream, submit a <<docs-delete,delete>> or
33-
<<docs-update,update>> API request to the backing index containing the document.
29+
* Data streams are best suited for time-based,
30+
<<data-streams-append-only,append-only>> use cases. If you frequently need to
31+
update or delete existing documents, we recommend using an index alias and an
32+
index template instead.
3433

3534

3635
[discrete]

docs/reference/data-streams/use-a-data-stream.asciidoc

Lines changed: 99 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ the following:
1010
* <<open-closed-backing-indices>>
1111
* <<reindex-with-a-data-stream>>
1212
* <<update-delete-docs-in-a-data-stream>>
13+
* <<update-delete-docs-in-a-backing-index>>
1314

1415
////
1516
[source,console]
@@ -67,6 +68,10 @@ POST /logs/_doc/
6768
----
6869
// TEST[continued]
6970
====
71+
72+
IMPORTANT: You cannot add new documents to a data stream using the index API's
73+
`PUT /<target>/_doc/<_id>` request format. Use the `PUT /<target>/_create/<_id>`
74+
format instead.
7075
--
7176

7277
* A <<docs-bulk,bulk API>> request using the `create` action. Specify the data
@@ -426,12 +431,96 @@ POST /_reindex
426431
[[update-delete-docs-in-a-data-stream]]
427432
=== Update or delete documents in a data stream
428433

429-
Data streams are designed to be <<data-streams-append-only,append-only>>. This
430-
means you cannot send update or deletion requests for existing documents to a
431-
data stream. However, you can send update or deletion requests to the backing
432-
index containing the document.
434+
You can update or delete documents in a data stream using the following
435+
requests:
436+
437+
* An <<docs-update-by-query,update by query API>> request
438+
+
439+
.*Example*
440+
[%collapsible]
441+
====
442+
The following update by query API request updates documents in the `logs` data
443+
stream with a `user.id` of `i96BP1mA`. The request uses a
444+
<<modules-scripting-using,script>> to assign matching documents a new `user.id`
445+
value of `XgdX0NoX`.
446+
447+
////
448+
[source,console]
449+
----
450+
PUT /logs/_create/2?refresh=wait_for
451+
{
452+
"@timestamp": "2020-12-07T11:06:07.000Z",
453+
"user": {
454+
"id": "i96BP1mA"
455+
}
456+
}
457+
----
458+
// TEST[continued]
459+
////
460+
461+
[source,console]
462+
----
463+
POST /logs/_update_by_query
464+
{
465+
"query": {
466+
"match": {
467+
"user.id": "i96BP1mA"
468+
}
469+
},
470+
"script": {
471+
"source": "ctx._source.user.id = params.new_id",
472+
"params": {
473+
"new_id": "XgdX0NoX"
474+
}
475+
}
476+
}
477+
----
478+
// TEST[continued]
479+
====
480+
481+
* A <<docs-delete-by-query,delete by query API>> request
482+
+
483+
.*Example*
484+
[%collapsible]
485+
====
486+
The following delete by query API request deletes documents in the `logs` data
487+
stream with a `user.id` of `zVZMamUM`.
488+
489+
////
490+
[source,console]
491+
----
492+
PUT /logs/_create/1?refresh=wait_for
493+
{
494+
"@timestamp": "2020-12-07T11:06:07.000Z",
495+
"user": {
496+
"id": "zVZMamUM"
497+
}
498+
}
499+
----
500+
// TEST[continued]
501+
////
502+
503+
[source,console]
504+
----
505+
POST /logs/_delete_by_query
506+
{
507+
"query": {
508+
"match": {
509+
"user.id": "zVZMamUM"
510+
}
511+
}
512+
}
513+
----
514+
// TEST[continued]
515+
====
516+
517+
[discrete]
518+
[[update-delete-docs-in-a-backing-index]]
519+
=== Update or delete documents in a backing index
433520

434-
To delete or update a document in a data stream, you first need to get:
521+
Alternatively, you can update or delete documents in a data stream by sending
522+
the update or deletion request to the backing index containing the document. To
523+
do this, you first need to get:
435524

436525
* The <<mapping-id-field,document ID>>
437526
* The name of the backing index that contains the document
@@ -506,7 +595,7 @@ information for any documents matching the search.
506595
{
507596
"_index": ".ds-logs-000003", <1>
508597
"_id": "bfspvnIBr7VVZlfp2lqX", <2>
509-
"_seq_no": 4, <3>
598+
"_seq_no": 8, <3>
510599
"_primary_term": 1, <4>
511600
"_score": 0.2876821,
512601
"_source": {
@@ -522,6 +611,8 @@ information for any documents matching the search.
522611
}
523612
----
524613
// TESTRESPONSE[s/"took": 20/"took": $body.took/]
614+
// TESTRESPONSE[s/"max_score": 0.2876821/"max_score": $body.hits.max_score/]
615+
// TESTRESPONSE[s/"_score": 0.2876821/"_score": $body.hits.hits.0._score/]
525616
526617
<1> Backing index containing the matching document
527618
<2> Document ID for the document
@@ -546,7 +637,7 @@ contains a new JSON source for the document.
546637
547638
[source,console]
548639
----
549-
PUT /.ds-logs-000003/_doc/bfspvnIBr7VVZlfp2lqX?if_seq_no=4&if_primary_term=1
640+
PUT /.ds-logs-000003/_doc/bfspvnIBr7VVZlfp2lqX?if_seq_no=8&if_primary_term=1
550641
{
551642
"@timestamp": "2020-12-07T11:06:07.000Z",
552643
"user": {
@@ -611,7 +702,7 @@ parameters.
611702
[source,console]
612703
----
613704
PUT /_bulk?refresh
614-
{ "index": { "_index": ".ds-logs-000003", "_id": "bfspvnIBr7VVZlfp2lqX", "if_seq_no": 4, "if_primary_term": 1 } }
705+
{ "index": { "_index": ".ds-logs-000003", "_id": "bfspvnIBr7VVZlfp2lqX", "if_seq_no": 8, "if_primary_term": 1 } }
615706
{ "@timestamp": "2020-12-07T11:06:07.000Z", "user": { "id": "8a4f500d" }, "message": "Login successful" }
616707
----
617708
// TEST[continued]

docs/reference/docs/delete-by-query.asciidoc

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -47,15 +47,15 @@ POST /twitter/_delete_by_query
4747
[[docs-delete-by-query-api-request]]
4848
==== {api-request-title}
4949

50-
`POST /<index>/_delete_by_query`
50+
`POST /<target>/_delete_by_query`
5151

5252
[[docs-delete-by-query-api-desc]]
5353
==== {api-description-title}
5454

5555
You can specify the query criteria in the request URI or the request body
5656
using the same syntax as the <<search-search,Search API>>.
5757

58-
When you submit a delete by query request, {es} gets a snapshot of the index
58+
When you submit a delete by query request, {es} gets a snapshot of the data stream or index
5959
when it begins processing the request and deletes matching documents using
6060
`internal` versioning. If a document changes between the time that the
6161
snapshot is taken and the delete operation is processed, it results in a version
@@ -134,12 +134,12 @@ Delete by query supports <<sliced-scroll, sliced scroll>> to parallelize the
134134
delete process. This can improve efficiency and provide a
135135
convenient way to break the request down into smaller parts.
136136

137-
Setting `slices` to `auto` chooses a reasonable number for most indices.
137+
Setting `slices` to `auto` chooses a reasonable number for most data streams and indices.
138138
If you're slicing manually or otherwise tuning automatic slicing, keep in mind
139139
that:
140140

141141
* Query performance is most efficient when the number of `slices` is equal to
142-
the number of shards in the index. If that number is large (for example,
142+
the number of shards in the index or backing index. If that number is large (for example,
143143
500), choose a lower number as too many `slices` hurts performance. Setting
144144
`slices` higher than the number of shards generally does not improve efficiency
145145
and adds overhead.
@@ -153,9 +153,11 @@ documents being reindexed and cluster resources.
153153
[[docs-delete-by-query-api-path-params]]
154154
==== {api-path-parms-title}
155155

156-
`<index>`::
157-
(Optional, string) A comma-separated list of index names to search. Use `_all`
158-
or omit to search all indices.
156+
`<target>`::
157+
(Optional, string)
158+
A comma-separated list of data streams, indices, and index aliases to search.
159+
Wildcard (`*`) expressions are supported. To search all data streams or indices
160+
in a cluster, omit this parameter or use `_all` or `*`.
159161

160162
[[docs-delete-by-query-api-query-params]]
161163
==== {api-query-parms-title}
@@ -200,7 +202,10 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=requests_per_second]
200202

201203
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing]
202204

203-
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll]
205+
`scroll`::
206+
(Optional, <<time-units,time value>>)
207+
Period to retain the <<scroll-search-context,search context>> for scrolling. See
208+
<<request-body-search-scroll>>.
204209

205210
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll_size]
206211

@@ -343,7 +348,7 @@ version conflicts.
343348
[[docs-delete-by-query-api-example]]
344349
==== {api-examples-title}
345350

346-
Delete all tweets from the `twitter` index:
351+
Delete all tweets from the `twitter` data stream or index:
347352

348353
[source,console]
349354
--------------------------------------------------
@@ -356,7 +361,7 @@ POST twitter/_delete_by_query?conflicts=proceed
356361
--------------------------------------------------
357362
// TEST[setup:twitter]
358363

359-
Delete documents from multiple indices:
364+
Delete documents from multiple data streams or indices:
360365

361366
[source,console]
362367
--------------------------------------------------
@@ -531,8 +536,8 @@ Which results in a sensible `total` like this one:
531536

532537
Setting `slices` to `auto` will let {es} choose the number of slices
533538
to use. This setting will use one slice per shard, up to a certain limit. If
534-
there are multiple source indices, it will choose the number of slices based
535-
on the index with the smallest number of shards.
539+
there are multiple source data streams or indices, it will choose the number of slices based
540+
on the index or backing index with the smallest number of shards.
536541

537542
Adding `slices` to `_delete_by_query` just automates the manual process used in
538543
the section above, creating sub-requests which means it has some quirks:
@@ -555,7 +560,7 @@ slices` are distributed proportionally to each sub-request. Combine that with
555560
the point above about distribution being uneven and you should conclude that
556561
using `max_docs` with `slices` might not result in exactly `max_docs` documents
557562
being deleted.
558-
* Each sub-request gets a slightly different snapshot of the source index
563+
* Each sub-request gets a slightly different snapshot of the source data stream or index
559564
though these are all taken at approximately the same time.
560565

561566
[float]

0 commit comments

Comments
 (0)