You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: gpu-operator/dra-intro-install.rst
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ Introduction
12
12
13
13
With NVIDIA's DRA Driver for GPUs, your Kubernetes workload can allocate and consume the following two types of resources:
14
14
15
-
* **GPUs**: for controlled sharing and dynamic reconfiguration of GPUs. A modern replacement for the traditional GPU allocation method (using `NVIDIA's device plugin <https://github.com/NVIDIA/k8s-device-plugin>`_). We are excited about this part of the driver; it is however not yet fully supported (Technology Preview).
15
+
* **GPUs**: for controlled sharing and dynamic reconfiguration of GPUs. A modern replacement for the traditional GPU allocation method (using `NVIDIA's device plugin <https://github.com/NVIDIA/k8s-device-plugin>`_). NVIDIA is excited about this part of the driver; it is however not yet fully supported (Technology Preview).
16
16
* **ComputeDomains**: for robust and secure Multi-Node NVLink (MNNVL) for NVIDIA GB200 and similar systems. Fully supported.
17
17
18
18
A primer on DRA
@@ -25,7 +25,7 @@ For NVIDIA devices, there are two particularly beneficial characteristics provid
25
25
#. A clean way to allocate **cross-node resources** in Kubernetes (leveraged here for providing NVLink connectivity across pods running on multiple nodes).
26
26
#. Mechanisms to explicitly **share, partition, and reconfigure** devices **on-the-fly** based on user requests (leveraged here for advanced GPU allocation).
27
27
28
-
To understand and make best use of NVIDIA's DRA Driver for GPUs, we recommend becoming familiar with DRA by working through the `official documentation <https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/>`_.
28
+
To understand and make best use of NVIDIA's DRA Driver for GPUs, NVIDIA recommends becoming familiar with DRA by working through the `official documentation <https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/>`_.
29
29
30
30
31
31
The twofold nature of this driver
@@ -34,7 +34,7 @@ The twofold nature of this driver
34
34
NVIDIA's DRA Driver for GPUs is comprised of two subsystems that are largely independent of each other: one manages GPUs, and the other one manages ComputeDomains.
35
35
36
36
Below, you can find instructions for how to install both parts or just one of them.
37
-
Additionally, we have prepared two separate documentation chapters, providing more in-depth information for each of the two subsystems:
37
+
Additionally, NVIDIA has prepared two separate documentation chapters, providing more in-depth information for each of the two subsystems:
38
38
39
39
- :ref:`Documentation for ComputeDomain (MNNVL) support <dra_docs_compute_domains>`
40
40
- :ref:`Documentation for GPU support <dra_docs_gpus>`
@@ -52,7 +52,7 @@ Prerequisites
52
52
- `CDI <https://github.com/cncf-tags/container-device-interface?tab=readme-ov-file#how-to-configure-cdi>`_ must be enabled in the underlying container runtime (such as containerd or CRI-O).
53
53
- NVIDIA GPU Driver 565 or later.
54
54
55
-
For the last two items on the list above, as well as for other reasons, we recommend installing NVIDIA's GPU Operator v25.3.0 or later.
55
+
For the last two items on the list above, as well as for other reasons, NVIDIA recommends installing NVIDIA's GPU Operator v25.3.0 or later.
56
56
For detailed instructions, see the official GPU Operator `installation documentation <https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#common-chart-customization-options>`__.
57
57
Also note that, in the near future, the preferred method to install NVIDIA's DRA Driver for GPUs will be through the GPU Operator (the DRA driver will then no longer require installation as a separate Helm chart).
58
58
@@ -65,8 +65,8 @@ Also note that, in the near future, the preferred method to install NVIDIA's DRA
65
65
- Refer to the `docs on installing the GPU Operator with a pre-installed GPU driver <https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#pre-installed-nvidia-gpu-drivers>`__.
66
66
67
67
68
-
Configure and Helm-install the driver
69
-
=====================================
68
+
Configure and install the driver with Helm
69
+
==========================================
70
70
71
71
#. Add the NVIDIA Helm repository:
72
72
@@ -103,15 +103,15 @@ All install-time configuration parameters can be listed by running ``helm show v
103
103
.. note::
104
104
105
105
- A common mode of operation for now is to enable only the ComputeDomain subsystem (to have GPUs allocated using the traditional device plugin). The example above achieves that by setting ``resources.gpus.enabled=false``.
106
-
- Setting ``nvidiaDriverRoot=/run/nvidia/driver`` above expects a GPU Operator-provided GPU driver. That configuration parameter must be changed in case the GPU driver is installed straight on the host (typically at ``/``, which is the default value for ``nvidiaDriverRoot``).
106
+
- Setting ``nvidiaDriverRoot=/run/nvidia/driver`` above expects a GPU Operator-provided GPU driver. That configuration parameter must be changed in case the GPU driver is installed straight on the host (typically at ``/``, which is the default value for ``nvidiaDriverRoot``).
107
107
108
108
109
109
Validate installation
110
110
=====================
111
111
112
112
A lot can go wrong, depending on the exact nature of your Kubernetes environment and specific hardware and driver choices as well as configuration options chosen.
113
-
That is why we recommend to perform a set of validation tests to confirm the basic functionality of your setup.
114
-
To that end, we have prepared separate documentation:
113
+
That is why NVIDIA recommends performing a set of validation tests to confirm the basic functionality of your setup.
114
+
To that end, NVIDIA has prepared separate documentation:
Copy file name to clipboardExpand all lines: gpu-operator/overview.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,7 +31,7 @@ configuration of multiple software components such as drivers, container runtime
31
31
and prone to errors. The NVIDIA GPU Operator uses the `operator framework <https://coreos.com/blog/introducing-operator-framework>`_
32
32
within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA),
33
33
Kubernetes device plugin for GPUs, the `NVIDIA Container Toolkit <https://github.com/NVIDIA/nvidia-container-toolkit>`_,
34
-
automatic node labelling using `GFD <https://github.com/NVIDIA/gpu-feature-discovery>`_, `DCGM <https://developer.nvidia.com/dcgm>`_ based monitoring and others.
34
+
automatic node labeling using `GFD <https://github.com/NVIDIA/gpu-feature-discovery>`_, `DCGM <https://developer.nvidia.com/dcgm>`_ based monitoring and others.
0 commit comments