Skip to content

[DOC] Take and restore snapshots - Request for HDFS info #10898

@etgraylog

Description

@etgraylog

What do you want to do?

  • Request a change to existing documentation
  • Add new documentation
  • Report a technical problem with the documentation
  • Other

Tell us about your request. Provide a summary of the request.

The page Take and restore snapshots mentions:

"A snapshot repository is just a storage location: a shared file system, Amazon Simple Storage Service (Amazon S3), Hadoop Distributed File System (HDFS), or Azure Storage."

While it provides information on registering snapshot repositories using these various storage locations, it does not detail how to use HDFS (only that it is supported).

Ex:


  1. To use an Apache HDFS cluster as a snapshot repository, install the repository-hdfs plugin on all nodes:
sudo ./bin/opensearch-plugin install repository-hdfs
  1. Restart all OpenSearch nodes.
  2. Execute the following OpenSearch API command with the desired values:
PUT _snapshot/searchable_snapshots
{
  "type": "hdfs",
  "settings": {
    "uri": "hdfs://namenode:8020/",
    "path": "opensearch/repositories/searchable_snapshots",
    "conf.dfs.client.read.shortcircuit": "true",
    "security.principal": "opensearch@YOURREALM"
  }
}

The example above requires clarification on what settings are supported including the client conf.* keys though. Key conf.fs.client.read.shortcircuit was included in the example as it provides, "a substantial performance boost to many applications.. Other settings like security_principal for connecting to a secure HDFS cluster exist too.

For reference, it appears the default HDFS configuration is loaded and file-system caching is disabled for the repository-hdfs plug-in.

Note:

Version: List the OpenSearch version to which this issue applies, e.g. 2.14, 2.12--2.14, or all.

all

What other resources are available? Provide links to related issues, POCs, steps for testing, etc.

https://github.com/opensearch-project/OpenSearch/blob/main/plugins/repository-hdfs/src/main/java/org/opensearch/repositories/hdfs/HdfsRepository.java

Metadata

Metadata

Assignees

No one assigned

    Labels

    BacklogIssue: The issue is unassigned.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions