[DOC] Take and restore snapshots - Request for HDFS info

**What do you want to do?**
 
- [x] Request a change to existing documentation
- [ ] Add new documentation
- [ ] Report a technical problem with the documentation
- [ ] Other 

**Tell us about your request.** Provide a summary of the request.

The page [Take and restore snapshots](https://docs.opensearch.org/latest/tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore/#register-repository) mentions:

<i>"A snapshot repository is just a storage location: a shared file system, Amazon Simple Storage Service (Amazon S3), Hadoop Distributed File System (HDFS), or Azure Storage."</i> 

While it provides information on registering snapshot repositories using these various storage locations, it does not detail how to use HDFS (only that it is supported).

Ex:

-----

1. To use an Apache HDFS cluster as a snapshot repository, install the repository-hdfs plugin on all nodes:
```bash
sudo ./bin/opensearch-plugin install repository-hdfs
```
2. Restart all OpenSearch nodes.
3. Execute the following OpenSearch API command with the desired values:
```json
PUT _snapshot/searchable_snapshots
{
  "type": "hdfs",
  "settings": {
    "uri": "hdfs://namenode:8020/",
    "path": "opensearch/repositories/searchable_snapshots",
    "conf.dfs.client.read.shortcircuit": "true",
    "security.principal": "opensearch@YOURREALM"
  }
}
```

-----

The example above requires clarification on what settings are supported including the client `conf.*` keys though. Key `conf.fs.client.read.shortcircuit` was included in the example as it provides, <i>"[a substantial performance boost to many applications.](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html)</i>. Other settings like `security_principal` for connecting to a secure HDFS cluster exist too.

For reference, it appears the default HDFS configuration is [loaded](https://github.com/opensearch-project/OpenSearch/blob/7c1052f8b2c33f2d17fb88d205d938154be9cff4/plugins/repository-hdfs/src/main/java/org/opensearch/repositories/hdfs/HdfsRepository.java#L125) and [file-system caching](https://github.com/opensearch-project/OpenSearch/blob/7c1052f8b2c33f2d17fb88d205d938154be9cff4/plugins/repository-hdfs/src/main/java/org/opensearch/repositories/hdfs/HdfsRepository.java#L136) is disabled for the `repository-hdfs` plug-in.

Note: 

**Version:** List the OpenSearch version to which this issue applies, e.g. 2.14, 2.12--2.14, or all.

all

**What other resources are available?** Provide links to related issues, POCs, steps for testing, etc.

https://github.com/opensearch-project/OpenSearch/blob/main/plugins/repository-hdfs/src/main/java/org/opensearch/repositories/hdfs/HdfsRepository.java

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DOC] Take and restore snapshots - Request for HDFS info #10898

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[DOC] Take and restore snapshots - Request for HDFS info #10898

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions