Skip to content

Conversation

@marinoborges
Copy link

@marinoborges marinoborges commented Feb 25, 2025

This change allows the deployment of the MPS daemon to be explicitly disabled by adding --set mps.enabled=false to a Helm install / upgrade command.

The default behaviour of the plugin is to deploy the MPS control daemonset even if not MPS sharing is configured. However, the actual MPS daemon is only started if a GPU is replicated using MPS. The mps.enabled Helm value allows this to be explicitly disabled.

Fixes #1177 with backward compatibility so this should be a minor change.

Signed-off-by: Marino Borges <[email protected]>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 25, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@marinoborges marinoborges changed the title Add support disable MPS [FEATURE] [HELM] Add support to disable MPS Feb 25, 2025
Copy link
Collaborator

@ArangoGutierrez ArangoGutierrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elezar / @tariq1890 Thoughts?

@marinoborges
Copy link
Author

Unsure if we need also to change NVIDIA_DRIVER_CAPABILITIES env var value when running without MPS.

Copy link
Contributor

@chipzoller chipzoller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also consider renaming this field so its purpose is more clear. Something more explicit like createDaemonset would be an example.

- vendor

mps:
enabled: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user, I think at a minimum there needs to be comments placed above this line to explain its use. If I see this in a values file, I may interpret mps.enabled: true to mean I, as a user, want MPS enabled right out of the gate. But that's not what this does. When set to false MPS will never work even if I later activate an MPS configuration by labeling a node.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment clarifying this

@sidewinder12s
Copy link

@chipzoller can you re-review this?

Copy link
Contributor

@chipzoller chipzoller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment itself looks fine, but this will need to be reviewed by a maintainer (I am not one).

@github-actions
Copy link

This PR is stale because it has been open 90 days with no activity. This PR will be closed in 30 days unless new comments are made or the stale label is removed.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 17, 2025
Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay here @marinoborges. I think the opt-out makes sense. I did have one question though:

What if a user is upgrading the device plugin and as such this value is not defined in the set that has already been applied? Does the default value of true function as expected here, or should we check for non-nil values explicitly?

@elezar elezar removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 17, 2025
@elezar elezar self-assigned this Sep 17, 2025
@elezar elezar changed the title [FEATURE] [HELM] Add support to disable MPS Allow MPS control daemonset to be explicitly disabled Sep 17, 2025
@marinoborges
Copy link
Author

@elezar if a user is upgrading device plugin via helm chart upgrade, the default value of true will be applied so no undesired changes are expected.
Also, if a user is upgrading device plugin via image value, then the helm chart change i'm proposing isn't taken into consideration so again no undesired changes are expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Helm Chart Unable to Prevent MPS DaemonSet Deployment

5 participants