Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Make User object immutable #5168

Open
nibix opened this issue Mar 10, 2025 · 5 comments
Open

[Performance] Make User object immutable #5168

nibix opened this issue Mar 10, 2025 · 5 comments
Labels
triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.

Comments

@nibix
Copy link
Collaborator

nibix commented Mar 10, 2025

Background

This is a follow-up to the optimization efforts started in #3870

In #3870, #4380 we have made already good progress in shaving off the performance overhead which is added by the security plugin to the overall performance shown by an OpenSearch cluster. In the course of this, we performed further performance tests which indicate that an OpenSearch cluster with the security plugin still shows a throughput with is about 10% lower than an OpenSearch cluster without the security plugin: #4380 (comment)

Analysis

It is already well known that the serialization and de-serialization of the user object adds a significant performance penalty: #2780, #4760 . There were a couple of efforts in this area already: One approach was to change the serialization protocol: #2802 . However, unexpected side-effects were encountered, which caused the change to be rolled back: #3771, #3776, #4670 .

In the course of this issue, we want to approach this again. However, this time not by optimizing the serialization protocol, but just by reducing the amount of serializations.

Life cycle of the org.opensearch.security.user.User object

The major class of which objects are often serialized and deserialized in an OpenSearch cluster is org.opensearch.security.user.User. So, let's have a look at its life cycle:

  • Instances of org.opensearch.security.user.User are usually created when a REST requested is authenticated by the security plugin. This happens in the class BackendRegistry, which either directly instantiates User objects or delegates it to classes implementing the AuthenticationBackend interface. Additionally, BackendRegistry maintains a cache of user objects in the case the same user sends several requests in sequence.

  • In a second step, BackendRegistry uses classes implementing AuthorizationBackend to add roles to the User object.

  • The class PrivilegesEvaluator performs role mapping and adds a number of effective roles to the User object to make these available to dynamic attributes used in the role configuration.

  • If the current node needs to delegate work to other nodes, the User object is serialized, put into a transport header and then de-serialized again on the receiving node. In case the current node needs to send messages to n other nodes, the user object will be serialized and de-serialized n times.

  • On the receiving node, the de-serialized User object will be used. Actual authentication via BackendRegistry and AuthenticationBackend does not happen. That means, the User object stays constant.

An immutable User object

The life cycle shows that the User object mostly stays constant after the authentication phase was finished. An outlier is the modification performed in PrivilegesEvaluator. The purpose of this, attribute interpolation, can be also achieved by other context variables. We already did changes in #4380 with the goal to make this modification unnecessary; specifically, this is the introduction of the PrivilegeEvaluationContext class, which also holds the effective roles.

If we can achieve a User object which is guaranteed to stay constant after initialization, i.e., which is immutable, we gain capabilities which are very helpful for optimizing serialization performance:

  • An immutable object means that the binary data created by serialization does not change as well. That means, the serialized data can be computed once per user object and then re-used again and again. The existing authentication cache even allows us to re-use the serialized data across different requests by the same user.

  • Additionally, a cache similar to the cache maintained by BackendRegistry can be utilized to also cut down the number of de-serialization operations to a single one per user per node. Such a cache would map the serialized binary data to the actual user object.

  • Immutable objects are inherently thread-safe. That makes any synchronization or locking mechanisms on the user object unnecessary.

Necessary changes

Obviously, in order to make the User object immutable, all code which modifies the User object needs to be changed.

Besides from the creation pefrormed in BackendRegistry and the AuthenticationBackend implementations, this pertains especially to two further places.

AuthorizationBackend

The AuthorizationBackend interface looks currently like this:

public interface AuthorizationBackend {
/**
* The type (name) of the authorizer. Only for logging.
* @return the type
*/
String getType();
/**
* Populate a {@link User} with backend roles. This method will not be called for cached users.
* <p/>
* Add them by calling either {@code user.addRole()} or {@code user.addRoles()}
* </P>
* @param user The authenticated user to populate with backend roles, never null
* @param credentials Credentials to authenticate to the authorization backend, maybe null.
* <em>This parameter is for future usage, currently always empty credentials are passed!</em>
* @throws OpenSearchSecurityException in case when the authorization backend cannot be reached
* or the {@code credentials} are insufficient to authenticate to the authorization backend.
*/
void fillRoles(User user, AuthCredentials credentials) throws OpenSearchSecurityException;
}

Implementors of the AuthorizationBackend interface are supposed to call addRoles() on the supplied User object in order to provide the user roles they have found.

This will not be possible any more with immutable user objects. Thus, the AuthorizationBackend interface must be changed. There are two options to handle this:

  1. Pass a builder class for user objects to the AuthorizationBackend interface:
    void fillRoles(User.Builder userBuilder, AuthCredentials credentials) throws OpenSearchSecurityException;
  1. Keep the User parameter, but let the fillRoles() create a new, modified User object and return it from the method:
    User fillRoles(User user, AuthCredentials credentials) throws OpenSearchSecurityException;

PrivilegesEvaluator

As described above, the adding of effective roles needs to be removed from PrivilegesEvaluator:

// Add the security roles for this user so that they can be used for DLS parameter substitution.
user.addSecurityRoles(mappedRoles);

The same information is already made available via PrivilegeEvaluationContext; it is just necessary to change all usage of this information to the new class.

Additional Benefits

It is generally agreed that the use of immutable objects contributes to high-quality, robust software, especially in multi-threaded environments. The security plugin was struggling with concurrent modification issues on the User object in the past: #4642, #2263 . Thus, the change will contribute to the general robustness of the security plugin.

@github-actions github-actions bot added the untriaged Require the attention of the repository maintainers and may need to be prioritized label Mar 10, 2025
@nibix
Copy link
Collaborator Author

nibix commented Mar 10, 2025

If there are any questions or comments on this: Please let me know!

I would like to start working on this soon.

@derek-ho derek-ho added triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable. and removed untriaged Require the attention of the repository maintainers and may need to be prioritized labels Mar 10, 2025
@derek-ho
Copy link
Collaborator

@nibix thank you for filing this issue. This sounds like a good performance improvement and this issue has clear acceptance criteria and background for working on this feature - marking it as triaged.

@shikharj05
Copy link
Contributor

Thanks @nibix for this proposal! This sounds like the right next step. Few thoughts-

  1. I assume this will be backwards compatible. A cluster with current implementation will still serialize the current mutable object, nodes on a newer version will still be able to de-serialize the object and cache it for further use.
  2. Any change that requires to mutate User object (like in AuthorizationBackend) will require creation of a new instance of User class - I do not expect this to cause impact in terms of Memory consumption or GC as user updates are less frequent.
  3. This will require creation of new User objects to also be thread safe, right?

@nibix
Copy link
Collaborator Author

nibix commented Mar 17, 2025

@shikharj05

I assume this will be backwards compatible. A cluster with current implementation will still serialize the current mutable object, nodes on a newer version will still be able to de-serialize the object and cache it for further use.

Exactly.

Any change that requires to mutate User object (like in AuthorizationBackend) will require creation of a new instance of User class - I do not expect this to cause impact in terms of Memory consumption or GC as user updates are less frequent.

Yes, I do not expect any significant impact from this. The additional overhead will be tiny, in the area of an additionally introduced log line or similar.

This will require creation of new User objects to also be thread safe, right?

IMHO, the only thing that needs to be thread safe will be the cache. The actual construction of the user object always happens on a single thread. As soon as the object is constructed, an immutable user object will be thread safe. It will be then made available to other threads via the cache.

@shikharj05
Copy link
Contributor

Thanks @nibix !

IMHO, the only thing that needs to be thread safe will be the cache. The actual construction of the user object always happens on a single thread. As soon as the object is constructed, an immutable user object will be thread safe. It will be then made available to other threads via the cache.

Makes sense, we can discuss this further on the PR (if required).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.
Projects
None yet
Development

No branches or pull requests

3 participants