-
Notifications
You must be signed in to change notification settings - Fork 744
Allow MPS control daemonset to be explicitly disabled #1178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Marino Borges <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elezar / @tariq1890 Thoughts?
|
Unsure if we need also to change NVIDIA_DRIVER_CAPABILITIES env var value when running without MPS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may also consider renaming this field so its purpose is more clear. Something more explicit like createDaemonset would be an example.
| - vendor | ||
|
|
||
| mps: | ||
| enabled: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user, I think at a minimum there needs to be comments placed above this line to explain its use. If I see this in a values file, I may interpret mps.enabled: true to mean I, as a user, want MPS enabled right out of the gate. But that's not what this does. When set to false MPS will never work even if I later activate an MPS configuration by labeling a node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment clarifying this
Signed-off-by: Marino Borges <[email protected]>
|
@chipzoller can you re-review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment itself looks fine, but this will need to be reviewed by a maintainer (I am not one).
|
This PR is stale because it has been open 90 days with no activity. This PR will be closed in 30 days unless new comments are made or the stale label is removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay here @marinoborges. I think the opt-out makes sense. I did have one question though:
What if a user is upgrading the device plugin and as such this value is not defined in the set that has already been applied? Does the default value of true function as expected here, or should we check for non-nil values explicitly?
|
@elezar if a user is upgrading device plugin via helm chart upgrade, the default value of |
This change allows the deployment of the MPS daemon to be explicitly disabled by adding
--set mps.enabled=falseto a Helm install / upgrade command.The default behaviour of the plugin is to deploy the MPS control daemonset even if not MPS sharing is configured. However, the actual MPS daemon is only started if a GPU is replicated using MPS. The
mps.enabledHelm value allows this to be explicitly disabled.Fixes #1177 with backward compatibility so this should be a minor change.