-
Notifications
You must be signed in to change notification settings - Fork 336
Disable device node creation in CDI mode #927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
00c5d3b
to
ba21d0e
Compare
8ca8b00
to
a802cc4
Compare
a802cc4
to
21d23bc
Compare
|
||
// Create the 'update-nvidia-params' command | ||
c := cli.Command{ | ||
Name: "update-nvidia-params", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should call this disable-device-node-modification
or something more descriptive of the intent.
cc @klueska what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be in favor of the more descriptive name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, I would also be in favor of keeping the update-nvidia-params
name and adding subcommands or command-line-flags specific to the update-nvidia-params
command that indicate which parameter in the params file we are updating. This way, if we ever need to update other parameters we have a natural way to express that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to have multi-modal hooks. We already have a two or three level nesting from the nvidia-cdi-hook
or nvidia-ctk hook
commands.
21d23bc
to
fd37ec2
Compare
What does "full" mean here? Fully occupied? Entire? |
Thanks for the detailed output. I wonder: why is the creation of those device nodes a problem? Can you add that to the PR description? It's probably a simple answer, for me it's not obvious at all. Is this maybe cosmetic? Does the modification also apply on the host, or it only 'visible' in the container? That is, does running nvidia-smi in the container mutate host state? Is that something we want to prevent from happening? |
|
||
var requiresModification bool | ||
for scanner.Scan() { | ||
line := scanner.Text() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of our favorite topics. As I learn about Golang, I naturally got curious here. Text processing and ingest or emission of 'text data' is always a fun topic. Because typically assumptions about text encodings are built into code, and those assumptions are often not obvious.
So, I just tried to understand what's happening here.
The scanner operates on a raw byte stream. Here we do the magic conversion to something called text. Text typically means: a sequence of abstract characters (unicode code points).
Going from a sequence of bytes to a sequence of unicode code points is called decoding, and it assumes a specific codec.
Which codec is used here?
Or Text()
is a bad naming choice in bufio and what's happening here is just splitting/tokenization of a byte sequence (output: sequence of byte sequences).
But then we compare with string literals below, as in e.g. strings.HasPrefix(line, "ModifyDeviceFiles: ")
.
To be precise, this tries to prefix-match the bytes in line
with the bytes underlying the string literal. String literals in Golang are stored as bytes using the UTF-8 code:
string literals always contain UTF-8 text as long as they have no byte-level escapes.
(source: https://go.dev/blog/strings)
And that's how this code assumes that the byte sequence entering the system represents text encoded with UTF-8 (or a subset thereof, which ascii is). And that is typically a valid assumption :).
// createFileInTempfs creates a file with the specified name, contents, and mode in a tmpfs. | ||
// A tmpfs is created at /tmp/nvct-emtpy-dir* with a size sufficient for the specified contents. | ||
func createFileInTempfs(name string, contents []byte, mode os.FileMode) (string, error) { | ||
tmpRoot, err := os.MkdirTemp("", "nvct-empty-dir*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the naming choice "nvct-empty-dir". Is this always empty? :)
Does this generate a directory visible on the host filesystem when I do ls /tmp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this generate a directory visible on the host filesystem when I do ls /tmp?
A direct quote from Evan:
"Note that since the hook is run as a createContainer hook, the mount operations are performed in the mount namespace of the container. This means that the mounts are not visible from the host."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this directory is always empty. The empty-dir
inffix is something that I have seen in the context of k8s, but I would have to dig for the exact source. (on a system where I was testing this we have around 42 of these folders).
In the current implementation this directory is created at /tmp/nvct-empty-dir*
on the host meaning that it is visible on the host when you run ls /tmp
:
elezar@dgx0126:~$ ls /tmp/nvct-empty-dir*
ls: cannot open directory '/tmp/nvct-empty-dir230315467': Permission denied
elezar@dgx0126:~$ sudo ls /tmp/nvct-empty-dir*
elezar@dgx0126:~$
As seen above, this directory is not accessible to regular users and also does not contain the nvct-params
file that we create in the tmpfs.
The mount operation that follows, however, creates the tmpfs mount in the container since this hook is running in the container's namespace:
root@8cda1ee3028a:/# mount | grep params
tmpfs on /proc/driver/nvidia/params type tmpfs (rw,relatime,size=4k)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, Evan!
Given the timeline and goal that we have I think we should land this asap. I have only looked at this very high-level and left high-level questions. I trust that code is good enough for a first release. 🚀 Please merge at your own discretion (from my point of view).
@@ -0,0 +1,199 @@ | |||
/** | |||
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/2022/2025
|
||
s, err := oci.LoadContainerState(cfg.containerSpec) | ||
if err != nil { | ||
return fmt.Errorf("failed to load container state: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/%v/%w
|
||
containerRoot, err := s.GetContainerRoot() | ||
if err != nil { | ||
return fmt.Errorf("failed to determined container root: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/%v/%w
} | ||
|
||
if err := bindMountReadonly(tempParamsFileName, filepath.Join(containerRoot, nvidiaDriverParamsPath)); err != nil { | ||
return fmt.Errorf("failed to create temporary parms file mount: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/parms/params
|
||
// Create the 'update-nvidia-params' command | ||
c := cli.Command{ | ||
Name: "update-nvidia-params", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be in favor of the more descriptive name.
We use "FULL GPU" do refer to a GPU that has NOT been partitioned using MIG. |
a39c195
to
8ada946
Compare
If required, this hook creates a modified params file (with ModifyDeviceFiles: 0) in a tmpfs and mounts this over /proc/driver/nvidia/params. This prevents device node creation when running tools such as nvidia-smi. In general the creation of these devices is cosmetic as a container does not have the required cgroup access for the devices. Signed-off-by: Evan Lezar <[email protected]>
This hook is not added to management specs. Signed-off-by: Evan Lezar <[email protected]>
8ada946
to
9c3086d
Compare
This change adds a hook to disable device node creation for FULL GPUs (i.e. non-MIG devices) or modification in a container by updating the
ModifyDeviceFiles
driver parameter.(This does not include
nvidia-caps
devices that are required by MIG devices).The presence of "extra" device nodes in a container are largely cosmetic since the container should not have the required cgroup access for the additional devices. This does not affect the device nodes on the host.
Without this change running a command like
nvidia-smi
in a creates the device nodes as follows:With the change applied we see:
due to the parameter being updated accordingly: