Disable device node creation in CDI mode #927

elezar · 2025-02-13T14:33:23Z

This change adds a hook to disable device node creation for FULL GPUs (i.e. non-MIG devices) or modification in a container by updating the ModifyDeviceFiles driver parameter.

(This does not include nvidia-caps devices that are required by MIG devices).

The presence of "extra" device nodes in a container are largely cosmetic since the container should not have the required cgroup access for the additional devices. This does not affect the device nodes on the host.

Without this change running a command like nvidia-smi in a creates the device nodes as follows:

elezar@dgx0126:~/src/container-toolkit$ docker run --rm -ti -e NVIDIA_VISIBLE_DEVICES=runtime.nvidia.com/gpu=0 --runtime=nvidia ubuntu bash -c "ls -al /dev/nvidia*; echo ""; nvidia-smi -L; echo ""; ls -al /dev/nvidia*"                                                                                                                                                                                            
crw-rw-rw- 1 root root 195, 254 Feb 13 14:29 /dev/nvidia-modeset
crw-rw-rw- 1 root root 511,   0 Feb 13 14:29 /dev/nvidia-uvm
crw-rw-rw- 1 root root 511,   1 Feb 13 14:29 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195,   0 Feb 13 14:29 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Feb 13 14:29 /dev/nvidiactl

GPU 0: Tesla V100-SXM2-16GB-N (UUID: GPU-edfee158-11c1-52b8-0517-92f30e7fac88)

crw-rw-rw- 1 root root 195, 254 Feb 13 14:29 /dev/nvidia-modeset
crw-rw-rw- 1 root root 511,   0 Feb 13 14:29 /dev/nvidia-uvm
crw-rw-rw- 1 root root 511,   1 Feb 13 14:29 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195,   0 Feb 13 14:29 /dev/nvidia0
crw-rw-rw- 1 root root 195,   1 Feb 13 14:29 /dev/nvidia1
crw-rw-rw- 1 root root 195,   2 Feb 13 14:29 /dev/nvidia2
crw-rw-rw- 1 root root 195,   3 Feb 13 14:29 /dev/nvidia3
crw-rw-rw- 1 root root 195,   4 Feb 13 14:29 /dev/nvidia4
crw-rw-rw- 1 root root 195,   5 Feb 13 14:29 /dev/nvidia5
crw-rw-rw- 1 root root 195,   6 Feb 13 14:29 /dev/nvidia6
crw-rw-rw- 1 root root 195,   7 Feb 13 14:29 /dev/nvidia7
crw-rw-rw- 1 root root 195, 255 Feb 13 14:29 /dev/nvidiactl

/dev/nvidia-caps:
total 0
drwxr-xr-x 2 root root     80 Feb 13 14:29 .
drwxr-xr-x 7 root root    640 Feb 13 14:29 ..
cr-------- 1 root root 238, 1 Feb 13 14:29 nvidia-cap1
cr--r--r-- 1 root root 238, 2 Feb 13 14:29 nvidia-cap2

With the change applied we see:

elezar@dgx0126:~/src/container-toolkit$ docker run --rm -ti -e NVIDIA_VISIBLE_DEVICES=runtime.nvidia.com/gpu=0 --runtime=nvidia ubuntu bash -c "ls -al /dev/nvidia*; echo ""; nvidia-smi -L; echo ""; ls -al /dev/nvidia*"
crw-rw-rw- 1 root root 195, 254 Feb 13 14:27 /dev/nvidia-modeset
crw-rw-rw- 1 root root 511,   0 Feb 13 14:27 /dev/nvidia-uvm
crw-rw-rw- 1 root root 511,   1 Feb 13 14:27 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195,   0 Feb 13 14:27 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Feb 13 14:27 /dev/nvidiactl

GPU 0: Tesla V100-SXM2-16GB-N (UUID: GPU-edfee158-11c1-52b8-0517-92f30e7fac88)

crw-rw-rw- 1 root root 195, 254 Feb 13 14:27 /dev/nvidia-modeset
crw-rw-rw- 1 root root 511,   0 Feb 13 14:27 /dev/nvidia-uvm
crw-rw-rw- 1 root root 511,   1 Feb 13 14:27 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195,   0 Feb 13 14:27 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Feb 13 14:27 /dev/nvidiactl

/dev/nvidia-caps:
total 0
drwxr-xr-x 2 root root     80 Feb 13 14:27 .
drwxr-xr-x 7 root root    500 Feb 13 14:27 ..
cr-------- 1 root root 238, 1 Feb 13 14:27 nvidia-cap1
cr--r--r-- 1 root root 238, 2 Feb 13 14:27 nvidia-cap2

due to the parameter being updated accordingly:

elezar@dgx0126:~/src/container-toolkit$ docker run --rm -ti -e NVIDIA_VISIBLE_DEVICES=runtime.nvidia.com/gpu=0 --runtime=nvidia ubuntu bash -c "cat /proc/driver/nvidia/params | grep ModifyDeviceFiles"
ModifyDeviceFiles: 0

elezar · 2025-03-19T14:42:59Z

cmd/nvidia-cdi-hook/update-nvidia-params/update-nvidia-params.go

+
+	// Create the 'update-nvidia-params' command
+	c := cli.Command{
+		Name:  "update-nvidia-params",


I think we should call this disable-device-node-modification or something more descriptive of the intent.

cc @klueska what do you think?

I would be in favor of the more descriptive name.

Alternatively, I would also be in favor of keeping the update-nvidia-params name and adding subcommands or command-line-flags specific to the update-nvidia-params command that indicate which parameter in the params file we are updating. This way, if we ever need to update other parameters we have a natural way to express that.

I don't want to have multi-modal hooks. We already have a two or three level nesting from the nvidia-cdi-hook or nvidia-ctk hook commands.

jgehrcke · 2025-03-21T13:26:31Z

for FULL GPUs

What does "full" mean here? Fully occupied? Entire?

jgehrcke · 2025-03-21T13:29:14Z

Without this change running a command like nvidia-smi in a creates the device nodes as follows:

Thanks for the detailed output. I wonder: why is the creation of those device nodes a problem? Can you add that to the PR description? It's probably a simple answer, for me it's not obvious at all. Is this maybe cosmetic? Does the modification also apply on the host, or it only 'visible' in the container? That is, does running nvidia-smi in the container mutate host state? Is that something we want to prevent from happening?

jgehrcke · 2025-03-21T13:48:11Z

cmd/nvidia-cdi-hook/update-nvidia-params/update-nvidia-params.go

+
+	var requiresModification bool
+	for scanner.Scan() {
+		line := scanner.Text()


One of our favorite topics. As I learn about Golang, I naturally got curious here. Text processing and ingest or emission of 'text data' is always a fun topic. Because typically assumptions about text encodings are built into code, and those assumptions are often not obvious.

So, I just tried to understand what's happening here.

The scanner operates on a raw byte stream. Here we do the magic conversion to something called text. Text typically means: a sequence of abstract characters (unicode code points).

Going from a sequence of bytes to a sequence of unicode code points is called decoding, and it assumes a specific codec.

Which codec is used here?

Or Text() is a bad naming choice in bufio and what's happening here is just splitting/tokenization of a byte sequence (output: sequence of byte sequences).

But then we compare with string literals below, as in e.g. strings.HasPrefix(line, "ModifyDeviceFiles: ").

To be precise, this tries to prefix-match the bytes in line with the bytes underlying the string literal. String literals in Golang are stored as bytes using the UTF-8 code:

string literals always contain UTF-8 text as long as they have no byte-level escapes.

(source: https://go.dev/blog/strings)

And that's how this code assumes that the byte sequence entering the system represents text encoded with UTF-8 (or a subset thereof, which ascii is). And that is typically a valid assumption :).

jgehrcke · 2025-03-21T13:50:32Z

cmd/nvidia-cdi-hook/update-nvidia-params/update-nvidia-params.go

+// createFileInTempfs creates a file with the specified name, contents, and mode in a tmpfs.
+// A tmpfs is created at /tmp/nvct-emtpy-dir* with a size sufficient for the specified contents.
+func createFileInTempfs(name string, contents []byte, mode os.FileMode) (string, error) {
+	tmpRoot, err := os.MkdirTemp("", "nvct-empty-dir*")


Not sure I understand the naming choice "nvct-empty-dir". Is this always empty? :)

Does this generate a directory visible on the host filesystem when I do ls /tmp?

Does this generate a directory visible on the host filesystem when I do ls /tmp?

A direct quote from Evan:

"Note that since the hook is run as a createContainer hook, the mount operations are performed in the mount namespace of the container. This means that the mounts are not visible from the host."

Yes, this directory is always empty. The empty-dir inffix is something that I have seen in the context of k8s, but I would have to dig for the exact source. (on a system where I was testing this we have around 42 of these folders).

In the current implementation this directory is created at /tmp/nvct-empty-dir* on the host meaning that it is visible on the host when you run ls /tmp:

elezar@dgx0126:~$ ls /tmp/nvct-empty-dir* ls: cannot open directory '/tmp/nvct-empty-dir230315467': Permission denied elezar@dgx0126:~$ sudo ls /tmp/nvct-empty-dir* elezar@dgx0126:~$

As seen above, this directory is not accessible to regular users and also does not contain the nvct-params file that we create in the tmpfs.

The mount operation that follows, however, creates the tmpfs mount in the container since this hook is running in the container's namespace:

root@8cda1ee3028a:/# mount | grep params tmpfs on /proc/driver/nvidia/params type tmpfs (rw,relatime,size=4k)

jgehrcke

Thank you, Evan!

Given the timeline and goal that we have I think we should land this asap. I have only looked at this very high-level and left high-level questions. I trust that code is good enough for a first release. 🚀 Please merge at your own discretion (from my point of view).

cdesiniotis · 2025-03-20T18:43:36Z

cmd/nvidia-cdi-hook/update-nvidia-params/update-nvidia-params.go

@@ -0,0 +1,199 @@
+/**
+# Copyright (c) 2022, NVIDIA CORPORATION.  All rights reserved.


nit: s/2022/2025

cdesiniotis · 2025-03-20T18:49:37Z

cmd/nvidia-cdi-hook/update-nvidia-params/update-nvidia-params.go

+
+	s, err := oci.LoadContainerState(cfg.containerSpec)
+	if err != nil {
+		return fmt.Errorf("failed to load container state: %v", err)


nit: s/%v/%w

cdesiniotis · 2025-03-20T18:50:09Z

cmd/nvidia-cdi-hook/update-nvidia-params/update-nvidia-params.go

+
+	containerRoot, err := s.GetContainerRoot()
+	if err != nil {
+		return fmt.Errorf("failed to determined container root: %v", err)


nit: s/%v/%w

cdesiniotis · 2025-03-20T18:53:24Z

cmd/nvidia-cdi-hook/update-nvidia-params/update-nvidia-params.go

+	}
+
+	if err := bindMountReadonly(tempParamsFileName, filepath.Join(containerRoot, nvidiaDriverParamsPath)); err != nil {
+		return fmt.Errorf("failed to create temporary parms file mount: %w", err)


nit: s/parms/params

cdesiniotis · 2025-03-20T18:58:47Z

cmd/nvidia-cdi-hook/update-nvidia-params/update-nvidia-params.go

+
+	// Create the 'update-nvidia-params' command
+	c := cli.Command{
+		Name:  "update-nvidia-params",


I would be in favor of the more descriptive name.

elezar · 2025-04-03T13:08:07Z

for FULL GPUs

What does "full" mean here? Fully occupied? Entire?

We use "FULL GPU" do refer to a GPU that has NOT been partitioned using MIG.

If required, this hook creates a modified params file (with ModifyDeviceFiles: 0) in a tmpfs and mounts this over /proc/driver/nvidia/params. This prevents device node creation when running tools such as nvidia-smi. In general the creation of these devices is cosmetic as a container does not have the required cgroup access for the devices. Signed-off-by: Evan Lezar <[email protected]>

This hook is not added to management specs. Signed-off-by: Evan Lezar <[email protected]>

elezar force-pushed the disable-device-node-creation branch from 00c5d3b to ba21d0e Compare February 13, 2025 14:33

elezar marked this pull request as draft February 13, 2025 14:33

elezar force-pushed the disable-device-node-creation branch 3 times, most recently from 8ca8b00 to a802cc4 Compare February 13, 2025 22:59

elezar added this to the Disable legacy code path by default milestone Feb 17, 2025

elezar marked this pull request as ready for review March 17, 2025 17:18

elezar force-pushed the disable-device-node-creation branch from a802cc4 to 21d23bc Compare March 17, 2025 17:26

elezar requested review from klueska, jgehrcke, tariq1890 and cdesiniotis March 17, 2025 17:27

elezar marked this pull request as draft March 19, 2025 14:37

elezar marked this pull request as ready for review March 19, 2025 14:37

elezar commented Mar 19, 2025

View reviewed changes

elezar force-pushed the disable-device-node-creation branch from 21d23bc to fd37ec2 Compare March 19, 2025 14:44

jgehrcke reviewed Mar 21, 2025

View reviewed changes

jgehrcke approved these changes Mar 21, 2025

View reviewed changes

cdesiniotis reviewed Mar 21, 2025

View reviewed changes

elezar force-pushed the disable-device-node-creation branch 2 times, most recently from a39c195 to 8ada946 Compare April 3, 2025 13:47

elezar added 2 commits April 3, 2025 15:48

Add disabled-device-node-modification hook to CDI spec

9c3086d

This hook is not added to management specs. Signed-off-by: Evan Lezar <[email protected]>

elezar force-pushed the disable-device-node-creation branch from 8ada946 to 9c3086d Compare April 3, 2025 13:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable device node creation in CDI mode #927

Disable device node creation in CDI mode #927

elezar commented Feb 13, 2025 •

edited

Loading

elezar Mar 19, 2025

cdesiniotis Mar 20, 2025

cdesiniotis Mar 21, 2025

elezar Apr 3, 2025

jgehrcke commented Mar 21, 2025

jgehrcke commented Mar 21, 2025

jgehrcke Mar 21, 2025

jgehrcke Mar 21, 2025

cdesiniotis Mar 21, 2025

elezar Apr 3, 2025

jgehrcke left a comment

cdesiniotis Mar 20, 2025

cdesiniotis Mar 20, 2025

cdesiniotis Mar 20, 2025

cdesiniotis Mar 20, 2025

cdesiniotis Mar 20, 2025

elezar commented Apr 3, 2025

		@@ -0,0 +1,199 @@
		/**
		# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.

Disable device node creation in CDI mode #927

Are you sure you want to change the base?

Disable device node creation in CDI mode #927

Conversation

elezar commented Feb 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgehrcke commented Mar 21, 2025

jgehrcke commented Mar 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgehrcke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elezar commented Apr 3, 2025

elezar commented Feb 13, 2025 •

edited

Loading