Skip to content

Conversation

@RyaliNvidia
Copy link
Contributor

@RyaliNvidia RyaliNvidia commented Dec 18, 2025

Description

Design doc for auth sidecar, idp integration, and resource action model

The numbers for the auth sidecar was from #126

Issue #148

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@RyaliNvidia RyaliNvidia requested a review from a team December 18, 2025 00:30

### Motivation

- **Performance**: Python middleware adds latency to every request; Golang gRPC with caching achieves <5ms authorization checks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isnt a strong argument for the golang sidecar. We could add caching to the python middleware too

Copy link
Contributor Author

@RyaliNvidia RyaliNvidia Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll switch this to reference my performance numbers that I gather at the bottom. Those tests are from where both the middleware and the go code had caching and were operating on cache hits.
The main premise of the test is to compare the go code with gRPC against the current python which is on even playing ground

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the biggest thing I need to figure out is how to connect the auth sidecar with the logging sidecar to send off logs.

But besides this, what are you concerns with using a sidecar in place of the middleware? From my perspective, the sidecar shouldn't actually add much overhead and actually provides more clarity to the service.

I do like the fact that we can just check the container to see the logs/health of the authorization logic. I didn't put this as a selling point because by default, we should add better logging to the logging sidecar for the authorization for no matter if we use the middleware or sidecar.

osmo user roles remove [email protected] --role osmo-ml-team

# List all users with a role
osmo role users list osmo-ml-team
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its kind of tricky that the first thing is singular and hte second thing is plural

osmo role users
osmo user roles

I can see myself getting confused and trying osmo roles ... and osmo users ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. This section is marked as WIP as I just wanted to put some stuff down as an example, but this is not the final look. I'm still considering whether we even need like osmo user or just osmo role

1. [Prerequisites](#prerequisites)
2. [Microsoft Entra ID (Azure AD)](#microsoft-entra-id-azure-ad)
3. [Google OAuth2](#google-oauth2)
4. [Amazon Cognito](#amazon-cognito)
Copy link
Contributor

@vvnpn-nv vvnpn-nv Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add integration for okta this is one of the most commonly used SSO providers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'll need to look into okta as I'm not too familiar with their auth process

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of amazon cognito can we look into Identity Center (AWS SSO) instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another commonly used auth provider is auth0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I think I'll probably handle Azure AD, Google OAuth2, and AWS SSO for the first verification and then work on supporting the rest. I'll add a section detailing further auth providers we want to support

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the content

"effect": "Allow",
"actions": [
"workflow:*",
"task:Read", "task:Update", "task:Cancel", "task:Exec", "task:PortForward", "task:Rsync",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question, how do we define a policy for users only able to portforward tasks in their authorized pool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. So task also uses the pool resource to determine the authorized pool as per the section above. Let me try to fix some examples or update the wording

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated


**By removing Keycloak and connecting Envoy directly to the IDP**, we achieve:

- **Single source of truth** for role assignments (OSMO database only)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question, if an IDP also provides roles, do we want to support that? Or is everything done strictly thru OSMO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the IDP provides roles and we will use that. Ideally to me, we would fetch the IDP roles and then update our DB with the information, so our DB is entirely accurate. As a result, the DB is still the single source of truth for OSMO. However, this option has an issue where if the user has been removed from access in the IDP, OSMO doesn't have real-time trigger to be aware of this so the OSMO DB will only be updated when the user tries to login next.

The other option is from @ecolternv 's suggestion that we don't store the role provided by the IDP and when auth is evaluated, we used the combine IDP value + OSMO specified config to determine access. This removes OSMO's responsibility to be up to date, but I don't really like that we can't tell the user's current access ability. Either way is a simple change, I prefer the first option, but I'm open to either


**Implementation**: Thread-safe in-memory cache with LRU eviction

**Cache Key**: Sorted, comma-separated role names (e.g., `"osmo-default,osmo-user"`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you share what the cache value might look like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPDX-License-Identifier: Apache-2.0
-->

# Resource-Action Permission Model Design
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we leverage existing libraries such as Casbin for RBAC control
https://casbin.org

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot that I actually looked into this as @fernandol-nvidia referenced this to me a couple of weeks ago. The reason why I strayed away from it is because I thought it had a strict compile time definition of the role to policy mapping, but I think I misunderstood. I'll play around with Casbin and see if we can leverage this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants