-
Notifications
You must be signed in to change notification settings - Fork 6
#148 - Auth Project Design Documents #165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
||
| ### Motivation | ||
|
|
||
| - **Performance**: Python middleware adds latency to every request; Golang gRPC with caching achieves <5ms authorization checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isnt a strong argument for the golang sidecar. We could add caching to the python middleware too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll switch this to reference my performance numbers that I gather at the bottom. Those tests are from where both the middleware and the go code had caching and were operating on cache hits.
The main premise of the test is to compare the go code with gRPC against the current python which is on even playing ground
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the biggest thing I need to figure out is how to connect the auth sidecar with the logging sidecar to send off logs.
But besides this, what are you concerns with using a sidecar in place of the middleware? From my perspective, the sidecar shouldn't actually add much overhead and actually provides more clarity to the service.
I do like the fact that we can just check the container to see the logs/health of the authorization logic. I didn't put this as a selling point because by default, we should add better logging to the logging sidecar for the authorization for no matter if we use the middleware or sidecar.
| osmo user roles remove [email protected] --role osmo-ml-team | ||
|
|
||
| # List all users with a role | ||
| osmo role users list osmo-ml-team |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its kind of tricky that the first thing is singular and hte second thing is plural
osmo role users
osmo user roles
I can see myself getting confused and trying osmo roles ... and osmo users ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. This section is marked as WIP as I just wanted to put some stuff down as an example, but this is not the final look. I'm still considering whether we even need like osmo user or just osmo role
| 1. [Prerequisites](#prerequisites) | ||
| 2. [Microsoft Entra ID (Azure AD)](#microsoft-entra-id-azure-ad) | ||
| 3. [Google OAuth2](#google-oauth2) | ||
| 4. [Amazon Cognito](#amazon-cognito) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add integration for okta this is one of the most commonly used SSO providers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'll need to look into okta as I'm not too familiar with their auth process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of amazon cognito can we look into Identity Center (AWS SSO) instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another commonly used auth provider is auth0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I think I'll probably handle Azure AD, Google OAuth2, and AWS SSO for the first verification and then work on supporting the rest. I'll add a section detailing further auth providers we want to support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the content
| "effect": "Allow", | ||
| "actions": [ | ||
| "workflow:*", | ||
| "task:Read", "task:Update", "task:Cancel", "task:Exec", "task:PortForward", "task:Rsync", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question, how do we define a policy for users only able to portforward tasks in their authorized pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. So task also uses the pool resource to determine the authorized pool as per the section above. Let me try to fix some examples or update the wording
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
|
||
| **By removing Keycloak and connecting Envoy directly to the IDP**, we achieve: | ||
|
|
||
| - **Single source of truth** for role assignments (OSMO database only) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question, if an IDP also provides roles, do we want to support that? Or is everything done strictly thru OSMO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the IDP provides roles and we will use that. Ideally to me, we would fetch the IDP roles and then update our DB with the information, so our DB is entirely accurate. As a result, the DB is still the single source of truth for OSMO. However, this option has an issue where if the user has been removed from access in the IDP, OSMO doesn't have real-time trigger to be aware of this so the OSMO DB will only be updated when the user tries to login next.
The other option is from @ecolternv 's suggestion that we don't store the role provided by the IDP and when auth is evaluated, we used the combine IDP value + OSMO specified config to determine access. This removes OSMO's responsibility to be up to date, but I don't really like that we can't tell the user's current access ability. Either way is a simple change, I prefer the first option, but I'm open to either
|
|
||
| **Implementation**: Thread-safe in-memory cache with LRU eviction | ||
|
|
||
| **Cache Key**: Sorted, comma-separated role names (e.g., `"osmo-default,osmo-user"`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you share what the cache value might look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my test for this, the cache values look like: https://github.com/NVIDIA/OSMO/pull/126/files#diff-0ec6cf70f862a1ea33cc5b4a3973ccca4e83f3dee324794ba2c327c8ccbd0897R44
| SPDX-License-Identifier: Apache-2.0 | ||
| --> | ||
|
|
||
| # Resource-Action Permission Model Design |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we leverage existing libraries such as Casbin for RBAC control
https://casbin.org
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot that I actually looked into this as @fernandol-nvidia referenced this to me a couple of weeks ago. The reason why I strayed away from it is because I thought it had a strict compile time definition of the role to policy mapping, but I think I misunderstood. I'll play around with Casbin and see if we can leverage this
Description
Design doc for auth sidecar, idp integration, and resource action model
The numbers for the auth sidecar was from #126
Issue #148
Checklist