-
Notifications
You must be signed in to change notification settings - Fork 34
Adds support for configuring Multi-Instance GPUs (MIG) #656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly comments re. where stuff is, and some minor typos etc
994d8f6
to
abf35e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realised a few other changes required from the stackhpc.openhpc bump - we might want to factor those out, potentially - you won't really care about them for client.
a4823a3
to
83ec813
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
9560d96
to
1c2d07d
Compare
Failing builds appear to be #685 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than a worry about the branch being out of date and hence testing applicability to main
.
ansible/roles/cuda/tasks/facts.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Soo. It looks like this branch doesn't incorporate #703
c.f
https://github.com/stackhpc/ansible-slurm-appliance/blob/feature/mig/ansible/roles/cuda/defaults/main.yml
vs
https://github.com/stackhpc/ansible-slurm-appliance/blob/main/ansible/roles/cuda/defaults/main.yml
It doesn't conflict, but are we sure if we merge it it is actually going to work on main??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot, I will update the branch to get CI to run again.
See
docs/mig.md
.The
stackhpc.openhpc
role has been bumped to support the NVIDIA GPU autodection required for MIG configuration.NB: This role bump also means parameters can be removed from slurm.conf, see stackhpc/ansible-role-openhpc#184