NVIDIA · a-mccarthy · Oct 21, 2025 · Sep 8, 2025 · Sep 9, 2025 · Oct 9, 2025
diff --git a/container-toolkit/arch-overview.md b/container-toolkit/arch-overview.md
@@ -78,7 +78,7 @@ This component is included in the `nvidia-container-toolkit` package.
 
 This component includes an executable that implements the interface required by a `runC` `prestart` hook. This script is invoked by `runC`
 after a container has been created, but before it has been started, and is given access to the `config.json` associated with the container
-(e.g. this [config.json](https://github.com/opencontainers/runtime-spec/blob/master/config.md#configuration-schema-example=) ). It then takes
+(such as this [config.json](https://github.com/opencontainers/runtime-spec/blob/master/config.md#configuration-schema-example=) ). It then takes
 information contained in the `config.json` and uses it to invoke the `nvidia-container-cli` CLI with an appropriate set of flags. One of the
 most important flags being which specific GPU devices should be injected into the container.
 
@@ -111,7 +111,7 @@ To use Kubernetes with Docker, you need to configure the Docker `daemon.json` to
 a reference to the NVIDIA Container Runtime and set this runtime as the default. The NVIDIA Container Toolkit contains a utility to update this file
 as highlighted in the `docker`-specific installation instructions.
 
-See the {doc}`install-guide` for more information on installing the NVIDIA Container Toolkit on various Linux distributions.
+Refer to the {doc}`install-guide` for more information on installing the NVIDIA Container Toolkit on various Linux distributions.
 
 ### Package Repository
 
@@ -130,7 +130,7 @@ For the different components:
 
 :::{note}
 As of the release of version `1.6.0` of the NVIDIA Container Toolkit the packages for all components are
-published to the `libnvidia-container` `repository <https://nvidia.github.io/libnvidia-container/>` listed above. For older package versions please see the documentation archives.
+published to the `libnvidia-container` `repository <https://nvidia.github.io/libnvidia-container/>` listed above. For older package versions refer to the documentation archives.
 :::
 
 Releases of the software are also hosted on `experimental` branch of the repository and are graduated to `stable` after test/validation. To get access to the latest

diff --git a/container-toolkit/cdi-support.md b/container-toolkit/cdi-support.md
@@ -1,6 +1,7 @@
 % Date: November 11 2022
 
-% Author: elezar
+% Author: elezar ([email protected])
+% Author: ArangoGutierrez ([email protected])
 
 % headings (h1/h2/h3/h4/h5) are # * = -
 
@@ -29,54 +30,134 @@ CDI also improves the compatibility of the NVIDIA container stack with certain f
 
 - You installed an NVIDIA GPU Driver.
 
-### Procedure
+### Automatic CDI Specification Generation
 
-Two common locations for CDI specifications are `/etc/cdi/` and `/var/run/cdi/`.
-The contents of the `/var/run/cdi/` directory are cleared on boot.
+As of NVIDIA Container Toolkit `v1.18.0`, the CDI specification is automatically generated and updated by a systemd service called `nvidia-cdi-refresh`. This service:
 
-However, the path to create and use can depend on the container engine that you use.
+- Automatically generates the CDI specification at `/var/run/cdi/nvidia.yaml` when:
+  - The NVIDIA Container Toolkit is installed or upgraded
+  - The NVIDIA GPU drivers are installed or upgraded
+  - The system is rebooted
 
-1. Generate the CDI specification file:
+This ensures that the CDI specifications are up to date for the current driver
+and device configuration and that CDI Devices defined in these speciciations are
+available when using native CDI support in container engines such as Docker or Podman.
 
-   ```console
-   $ sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
-   ```
-
-   The sample command uses `sudo` to ensure that the file at `/etc/cdi/nvidia.yaml` is created.
-   You can omit the `--output` argument to print the generated specification to `STDOUT`.
+Running the following command will give a list of availble CDI Devices:
+```console
+nvidia-ctk cdi list
+```
 
-   *Example Output*
+#### Known limitations
+The `nvidia-cdi-refresh` service does not currently handle the following situations:
 
-   ```output
-   INFO[0000] Auto-detected mode as "nvml"
-   INFO[0000] Selecting /dev/nvidia0 as /dev/nvidia0
-   INFO[0000] Selecting /dev/dri/card1 as /dev/dri/card1
-   INFO[0000] Selecting /dev/dri/renderD128 as /dev/dri/renderD128
-   INFO[0000] Using driver version xxx.xxx.xx
-   ...
-   ```
+- The removal of NVIDIA GPU drivers
+- The reconfiguration of MIG devices
 
-1. (Optional) Check the names of the generated devices:
+For these scenarios, the regeneration of CDI specifications must be [manually triggered](#manual-cdi-specification-generation).
 
-   ```console
-   $ nvidia-ctk cdi list
-   ```
+#### Customizing the Automatic CDI Refresh Service
+The behavior of the `nvidia-cdi-refresh` service can be customized by adding
+environment variables to `/etc/nvidia-container-toolkit/cdi-refresh.env` to
+affect the behavior of the `nvidia-ctk cdi generate` command.
 
-   The following example output is for a machine with a single GPU that does not support MIG.
+As an example, to enable debug logging the configuration file should be updated
+as follows:
+```bash
+# /etc/nvidia-container-toolkit/cdi-refresh.env
+NVIDIA_CTK_DEBUG=1
+```
 
-   ```output
-   INFO[0000] Found 9 CDI devices
-   nvidia.com/gpu=all
-   nvidia.com/gpu=0
-   ```
+For a complete list of available environment variables, run `nvidia-ctk cdi generate --help` to see the command's documentation.
 
 ```{important}
-You must generate a new CDI specification after any of the following changes:
+Modifications to the environment file required a systemd reload and restarting the
+service to take effect
+```
+
+```console
+$ sudo systemctl daemon-reload
+$ sudo systemctl restart nvidia-cdi-refresh.service
+```
+
+#### Managing the CDI Refresh Service
+
+The `nvidia-cdi-refresh` service consists of two systemd units:
+
+- `nvidia-cdi-refresh.path`: Monitors for changes to the system and triggers the service.
+- `nvidia-cdi-refresh.service`: Generates the CDI specifications for all available devices based on
+  the default configuration and any overrides in the environment file.
+
+These services can be managed using standard systemd commands.
+
+When working as expected, the `nvidia-cdi-refresh.path` service will be enabled and active, and the
+`nvidia-cdi-refresh.service` will be enabled and have run at least once. For example:
+
+```console
+$ sudo systemctl status nvidia-cdi-refresh.path
+● nvidia-cdi-refresh.path - Trigger CDI refresh on NVIDIA driver install / uninstall events
+     Loaded: loaded (/etc/systemd/system/nvidia-cdi-refresh.path; enabled; preset: enabled)
+     Active: active (waiting) since Fri 2025-06-27 06:04:54 EDT; 1h 47min ago
+   Triggers: ● nvidia-cdi-refresh.service
+```
+
+```console
+$ sudo systemctl status nvidia-cdi-refresh.service
+○ nvidia-cdi-refresh.service - Refresh NVIDIA CDI specification file
+     Loaded: loaded (/etc/systemd/system/nvidia-cdi-refresh.service; enabled; preset: enabled)
+     Active: inactive (dead) since Fri 2025-06-27 07:17:26 EDT; 34min ago
+TriggeredBy: ● nvidia-cdi-refresh.path
+    Process: 1317511 ExecStart=/usr/bin/nvidia-ctk cdi generate --output=/var/run/cdi/nvidia.yaml (code=exited, status=0/SUCCESS)
+   Main PID: 1317511 (code=exited, status=0/SUCCESS)
+        CPU: 562ms
+...
+```
+
+If these are not enabled as expected, they can be enabled by running:
+
+```console
+$ sudo systemctl enable --now nvidia-cdi-refresh.path
+$ sudo systemctl enable --now nvidia-cdi-refresh.service
+```
+
+#### Troubleshooting CDI Specification Generation and Resolution
+
+If CDI specifications for available devices are not generated / updated as expected, it is
+recommended that the logs for the `nvidia-cdi-refresh.service` be checked. This can be
+done by running:
+
+```console
+$ sudo journalctl -u nvidia-cdi-refresh.service
+```
+
+In most cases, restarting the service should be sufficient to trigger the (re)generation
+of CDI specifications:
+
+```console
+$ sudo systemctl restart nvidia-cdi-refresh.service
+```
 
-- You change the device or CUDA driver configuration.
-- You use a location such as `/var/run/cdi` that is cleared on boot.
+Running:
 
-A configuration change can occur when MIG devices are created or removed, or when the driver is upgraded.
+```console
+$ nvidia-ctk --debug cdi list
+```
+will show a list of available CDI Devices as well as any errors that may have
+occurred when loading CDI Specifications from `/etc/cdi` or `/var/run/cdi`.
+
+### Manual CDI Specification Generation
+
+As of the NVIDIA Container Toolkit `v1.18.0` the recommended mechanism to regenerate CDI specifications is to restart the `nvidia-cdi-refresh.service`:
+
+```console
+$ sudo systemctl restart nvidia-cdi-refresh.service
+```
+
+If this does not work, or more flexibility is required, the `nvidia-ctk cdi generate` command
+can be used directly:
+
+```console
+$ sudo nvidia-ctk cdi generate --output=/var/run/cdi/nvidia.yaml
 ```
 
 ## Running a Workload with CDI

diff --git a/container-toolkit/docker-specialized.md b/container-toolkit/docker-specialized.md
@@ -206,7 +206,7 @@ The supported constraints are provided below:
       - constraint on the compute architectures of the selected GPUs.
 
     * - ``brand``
-      - constraint on the brand of the selected GPUs (e.g. GeForce, Tesla, GRID).
+      - constraint on the brand of the selected GPUs (such as GeForce, Tesla, GRID).
 ```
 
 Multiple constraints can be expressed in a single environment variable: space-separated constraints are ORed,

diff --git a/container-toolkit/index.md b/container-toolkit/index.md
@@ -35,5 +35,5 @@ The NVIDIA Container Toolkit is a collection of libraries and utilities enabling
 ## License
 
 The NVIDIA Container Toolkit (and all included components) is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) and
-contributions are accepted with a Developer Certificate of Origin (DCO). See the [contributing](https://github.com/NVIDIA/nvidia-container-toolkit/blob/master/CONTRIBUTING.md) document for
+contributions are accepted with a Developer Certificate of Origin (DCO). Refer to the [contributing](https://github.com/NVIDIA/nvidia-container-toolkit/blob/master/CONTRIBUTING.md) document for
 more information.
diff --git a/container-toolkit/install-guide.md b/container-toolkit/install-guide.md
@@ -21,7 +21,7 @@ Alternatively, you can install the driver by [downloading](https://www.nvidia.co
 ```{note}
 There is a [known issue](troubleshooting.md#containers-losing-access-to-gpus-with-error-failed-to-initialize-nvml-unknown-error) on systems
 where `systemd` cgroup drivers are used that cause containers to lose access to requested GPUs when
-`systemctl daemon reload` is run. Please see the troubleshooting documentation for more information.
+`systemctl daemon reload` is run. Refer to the troubleshooting documentation for more information.
 ```
 
 (installing-with-apt)=
@@ -31,6 +31,12 @@ where `systemd` cgroup drivers are used that cause containers to lose access to
    ```{note}
    These instructions [should work](./supported-platforms.md) for any Debian-derived distribution.
    ```
+1. Install the prerequisites for the instructions below:
+   ```console
+   $ sudo apt-get update && apt-get install -y --no-install-recommends \
+      curl \
+      gnupg2
+   ```
 
 1. Configure the production repository:
 
@@ -78,6 +84,12 @@ where `systemd` cgroup drivers are used that cause containers to lose access to
    These instructions [should work](./supported-platforms.md) for many RPM-based distributions.
    ```
 
+1. Install the prerequisites for the instructions below:
+   ```console
+   $ sudo dnf install -y \
+      curl
+   ```
+
 1. Configure the production repository:
 
    ```console
@@ -186,8 +198,10 @@ follow these steps:
    $ sudo nvidia-ctk runtime configure --runtime=containerd
    ```
 
-   The `nvidia-ctk` command modifies the `/etc/containerd/config.toml` file on the host.
-   The file is updated so that containerd can use the NVIDIA Container Runtime.
+   By default, the `nvidia-ctk` command creates a `/etc/containerd/conf.d/99-nvidia.toml`
+   drop-in config file and modifies (or creates) the `/etc/containerd/config.toml` file
+   to ensure that the `imports` config option is updated accordingly. The drop-in file
+   ensures that containerd can use the NVIDIA Container Runtime.
 
 1. Restart containerd:
 
@@ -201,7 +215,7 @@ No additional configuration is needed.
 You can just run `nerdctl run --gpus=all`, with root or without root.
 You do not need to run the `nvidia-ctk` command mentioned above for Kubernetes.
 
-See also the [nerdctl documentation](https://github.com/containerd/nerdctl/blob/main/docs/gpu.md).
+Refer to the [nerdctl documentation](https://github.com/containerd/nerdctl/blob/main/docs/gpu.md) for more information.
 
 ### Configuring CRI-O
 
@@ -211,8 +225,8 @@ See also the [nerdctl documentation](https://github.com/containerd/nerdctl/blob/
    $ sudo nvidia-ctk runtime configure --runtime=crio
    ```
 
-   The `nvidia-ctk` command modifies the `/etc/crio/crio.conf` file on the host.
-   The file is updated so that CRI-O can use the NVIDIA Container Runtime.
+   By default, the `nvidia-ctk` command creates a `/etc/crio/conf.d/99-nvidia.toml`
+   drop-in config file. The drop-in file ensures that CRI-O can use the NVIDIA Container Runtime.
 
 1. Restart the CRI-O daemon:
 
@@ -229,7 +243,6 @@ See also the [nerdctl documentation](https://github.com/containerd/nerdctl/blob/
 
 For Podman, NVIDIA recommends using [CDI](./cdi-support.md) for accessing NVIDIA devices in containers.
 
-
 ## Next Steps
 
 - [](./sample-workload.md)