Skip to content

Small improvements to repo analysis docs #128490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 33 additions & 15 deletions docs/reference/snapshot-restore/apis/repo-analysis-api.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -60,23 +60,41 @@ measure the performance characteristics of your storage system.
The default values for the parameters to this API are deliberately low to reduce
the impact of running an analysis inadvertently and to provide a sensible
starting point for your investigations. Run your first analysis with the default
parameter values to check for simple problems. If successful, run a sequence of
increasingly large analyses until you encounter a failure or you reach a
`blob_count` of at least `2000`, a `max_blob_size` of at least `2gb`, a
`max_total_data_size` of at least `1tb`, and a `register_operation_count` of at
least `100`. Always specify a generous timeout, possibly `1h` or longer, to
allow time for each analysis to run to completion. Perform the analyses using a
multi-node cluster of a similar size to your production cluster so that it can
detect any problems that only arise when the repository is accessed by many
nodes at once.
parameter values to check for simple problems. Some repositories may behave
correctly when lightly loaded but incorrectly under production-like workloads.
If the first analysis is successful, run a sequence of increasingly large
analyses until you encounter a failure or you reach a `blob_count` of at least
`2000`, a `max_blob_size` of at least `2gb`, a `max_total_data_size` of at least
`1tb`, and a `register_operation_count` of at least `100`. Always specify a
generous timeout, possibly `1h` or longer, to allow time for each analysis to
run to completion. Some repositories may behave correctly when accessed by a
small number of {es} nodes but incorrectly when accessed concurrently by a
production-scale cluster. Perform the analyses using a multi-node cluster of a
similar size to your production cluster so that it can detect any problems that
only arise when the repository is accessed by many nodes at once.

If the analysis fails then {es} detected that your repository behaved
unexpectedly. This usually means you are using a third-party storage system
with an incorrect or incompatible implementation of the API it claims to
support. If so, this storage system is not suitable for use as a snapshot
repository. You will need to work with the supplier of your storage system to
address the incompatibilities that {es} detects. See
<<self-managed-repo-types>> for more information.
unexpectedly. This usually means you are using a third-party storage system with
an incorrect or incompatible implementation of the API it claims to support. If
so, this storage system is not suitable for use as a snapshot repository.
Repository analysis triggers conditions that occur only rarely when taking
snapshots in a production system. Snapshotting to unsuitable storage may appear
to work correctly most of the time despite repository analysis failures. However
your snapshot data is at risk if you store it in a snapshot repository that does
not reliably pass repository analysis. You can demonstrate that the analysis
failure is due to an incompatible storage implementation by verifying that
Elasticsearch does not detect the same problem when analysing the reference
implementation of the storage protocol you are using. For instance, if you are
using storage that offers an API which the supplier claims to be compatible with
AWS S3, verify that repositories in AWS S3 do not fail repository analysis. This
allows you to demonstrate to your storage supplier that a repository analysis
failure must only be caused by an incompatibility with AWS S3 and cannot be
attributed to a problem in Elasticsearch. Please do not report Elasticsearch
issues involving third-party storage systems unless you can demonstrate that the
same issue exists when analysing a repository that uses the reference
implementation of the same storage protocol. You will need to work with the
supplier of your storage system to address the incompatibilities that {es}
detects. See <<self-managed-repo-types>> for more information.

If the analysis is successful this API returns details of the testing process,
optionally including how long each operation took. You can use this information
Expand Down