Skip to content

Snakebite doesn't work with HDFS RPC encryption #8

Open
@elukey

Description

@elukey

This is a tracking task to list all the work needed to solve one outstanding issue with snakebite. When RPC encryption is enabled for HDFS, the following happens:

  • snakebite contacts the HDFS namenode via Hadoop RPC, negotiating the encryption settings using GSS-API via SASL. It needs to retrieve the list of blocks to read/write and the related datanodes to talk to. This part works fine.
  • snakebite then has to contact every HDFS datanode, using a specific RPC protocol that is not Hadoop RPC. The authentication is done via DIGEST-MD5 via SASL, that also allows to set the encryption level if needed (to then allow the negotiation of AES encryption). This bit currently doesn't work because the code that would be needed relies on functionalities of SASL that are not implemented in pure-sasl (namely DIGEST-MD5).

I opened an issue to pure-sasl (thobbs/pure-sasl#32) but some work would be needed to add the missing features.

The alternative would be to use sasl (https://github.com/cloudera/python-sasl) but unfortunately the library is not maintained since 2016. There is a fork that we could consider that should support DIGEST-MD5 + GSS-API: cloudera/python-sasl#15 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions