Skip to content

Commit 738f822

Browse files
authored
Add 580.65.06 to 25.3.2 (#230)
* Add 580.65.06 to 25.3.2 Signed-off-by: Andrew Chen <[email protected]> * added known issue Signed-off-by: Andrew Chen <[email protected]> * reworded known issue Signed-off-by: Andrew Chen <[email protected]> * updated Containerd to support up to 2.1 Signed-off-by: Andrew Chen <[email protected]> * added fix for CVE-2025-23266 and CVE-2025-23267 to release notes Signed-off-by: Andrew Chen <[email protected]> * added nouveau driver to Known Issues Signed-off-by: Andrew Chen <[email protected]> * minor edits to comply with NVIDIA Style Guide Signed-off-by: Andrew Chen <[email protected]> * reorder known issues Signed-off-by: Andrew Chen <[email protected]> --------- Signed-off-by: Andrew Chen <[email protected]>
1 parent e4085fc commit 738f822

15 files changed

+146
-125
lines changed

container-toolkit/arch-overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ a `prestart` hook into it, and then calls out to the native `runC`, passing it t
9292
For versions of the NVIDIA Container Runtime from `v1.12.0`, this runtime also performs additional modifications to the OCI runtime spec to inject
9393
specific devices and mounts not handled by the NVIDIA Container CLI.
9494

95-
It's important to note that this component is not necessarily specific to docker (but it is specific to `runC`).
95+
It is important to note that this component is not necessarily specific to docker (but it is specific to `runC`).
9696

9797
### The NVIDIA Container Toolkit CLI
9898

gpu-operator/dra-intro-install.rst

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Introduction
1212

1313
With NVIDIA's DRA Driver for GPUs, your Kubernetes workload can allocate and consume the following two types of resources:
1414

15-
* **GPUs**: for controlled sharing and dynamic reconfiguration of GPUs. A modern replacement for the traditional GPU allocation method (using `NVIDIA's device plugin <https://github.com/NVIDIA/k8s-device-plugin>`_). We are excited about this part of the driver; it is however not yet fully supported (Technology Preview).
15+
* **GPUs**: for controlled sharing and dynamic reconfiguration of GPUs. A modern replacement for the traditional GPU allocation method (using `NVIDIA's device plugin <https://github.com/NVIDIA/k8s-device-plugin>`_). NVIDIA is excited about this part of the driver; it is however not yet fully supported (Technology Preview).
1616
* **ComputeDomains**: for robust and secure Multi-Node NVLink (MNNVL) for NVIDIA GB200 and similar systems. Fully supported.
1717

1818
A primer on DRA
@@ -25,7 +25,7 @@ For NVIDIA devices, there are two particularly beneficial characteristics provid
2525
#. A clean way to allocate **cross-node resources** in Kubernetes (leveraged here for providing NVLink connectivity across pods running on multiple nodes).
2626
#. Mechanisms to explicitly **share, partition, and reconfigure** devices **on-the-fly** based on user requests (leveraged here for advanced GPU allocation).
2727

28-
To understand and make best use of NVIDIA's DRA Driver for GPUs, we recommend becoming familiar with DRA by working through the `official documentation <https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/>`_.
28+
To understand and make best use of NVIDIA's DRA Driver for GPUs, NVIDIA recommends becoming familiar with DRA by working through the `official documentation <https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/>`_.
2929

3030

3131
The twofold nature of this driver
@@ -34,7 +34,7 @@ The twofold nature of this driver
3434
NVIDIA's DRA Driver for GPUs is comprised of two subsystems that are largely independent of each other: one manages GPUs, and the other one manages ComputeDomains.
3535

3636
Below, you can find instructions for how to install both parts or just one of them.
37-
Additionally, we have prepared two separate documentation chapters, providing more in-depth information for each of the two subsystems:
37+
Additionally, NVIDIA has prepared two separate documentation chapters, providing more in-depth information for each of the two subsystems:
3838

3939
- :ref:`Documentation for ComputeDomain (MNNVL) support <dra_docs_compute_domains>`
4040
- :ref:`Documentation for GPU support <dra_docs_gpus>`
@@ -52,7 +52,7 @@ Prerequisites
5252
- `CDI <https://github.com/cncf-tags/container-device-interface?tab=readme-ov-file#how-to-configure-cdi>`_ must be enabled in the underlying container runtime (such as containerd or CRI-O).
5353
- NVIDIA GPU Driver 565 or later.
5454

55-
For the last two items on the list above, as well as for other reasons, we recommend installing NVIDIA's GPU Operator v25.3.0 or later.
55+
For the last two items on the list above, as well as for other reasons, NVIDIA recommends installing NVIDIA's GPU Operator v25.3.0 or later.
5656
For detailed instructions, see the official GPU Operator `installation documentation <https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#common-chart-customization-options>`__.
5757
Also note that, in the near future, the preferred method to install NVIDIA's DRA Driver for GPUs will be through the GPU Operator (the DRA driver will then no longer require installation as a separate Helm chart).
5858

@@ -65,8 +65,8 @@ Also note that, in the near future, the preferred method to install NVIDIA's DRA
6565
- Refer to the `docs on installing the GPU Operator with a pre-installed GPU driver <https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#pre-installed-nvidia-gpu-drivers>`__.
6666

6767

68-
Configure and Helm-install the driver
69-
=====================================
68+
Configure and install the driver with Helm
69+
==========================================
7070

7171
#. Add the NVIDIA Helm repository:
7272

@@ -103,15 +103,15 @@ All install-time configuration parameters can be listed by running ``helm show v
103103
.. note::
104104

105105
- A common mode of operation for now is to enable only the ComputeDomain subsystem (to have GPUs allocated using the traditional device plugin). The example above achieves that by setting ``resources.gpus.enabled=false``.
106-
- Setting ``nvidiaDriverRoot=/run/nvidia/driver`` above expects a GPU Operator-provided GPU driver. That configuration parameter must be changed in case the GPU driver is installed straight on the host (typically at ``/``, which is the default value for ``nvidiaDriverRoot``).
106+
- Setting ``nvidiaDriverRoot=/run/nvidia/driver`` above expects a GPU Operator-provided GPU driver. That configuration parameter must be changed in case the GPU driver is installed straight on the host (typically at ``/``, which is the default value for ``nvidiaDriverRoot``).
107107

108108

109109
Validate installation
110110
=====================
111111

112112
A lot can go wrong, depending on the exact nature of your Kubernetes environment and specific hardware and driver choices as well as configuration options chosen.
113-
That is why we recommend to perform a set of validation tests to confirm the basic functionality of your setup.
114-
To that end, we have prepared separate documentation:
113+
That is why NVIDIA recommends performing a set of validation tests to confirm the basic functionality of your setup.
114+
To that end, NVIDIA has prepared separate documentation:
115115

116116
- `Testing ComputeDomain allocation <https://github.com/NVIDIA/k8s-dra-driver-gpu/wiki/Validate-setup-for-ComputeDomain-allocation>`_
117117
- `Testing GPU allocation <https://github.com/NVIDIA/k8s-dra-driver-gpu/wiki/Validate-setup-for-GPU-allocation>`_

gpu-operator/getting-started.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -277,7 +277,7 @@ To view all the options, run ``helm show values nvidia/gpu-operator``.
277277
- ``{}``
278278

279279
* - ``psp.enabled``
280-
- The GPU operator deploys ``PodSecurityPolicies`` if enabled.
280+
- The GPU Operator deploys ``PodSecurityPolicies`` if enabled.
281281
- ``false``
282282

283283
* - ``sandboxWorkloads.defaultWorkload``

gpu-operator/gpu-operator-kata.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@
1919
..
2020
lingo:
2121
22-
It's "Kata Containers" when referring to the software component.
23-
It's "Kata container" when it's a container that uses the Kata Containers runtime.
22+
It is "Kata Containers" when referring to the software component.
23+
It is "Kata container" when it is a container that uses the Kata Containers runtime.
2424
Treat our operands as proper nouns and use title case.
2525

2626
#################################

gpu-operator/gpu-operator-kubevirt.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ Given the following node configuration:
3737
* Node B is configured with the label ``nvidia.com/gpu.workload.config=vm-passthrough`` and configured to run virtual machines with Passthrough GPU.
3838
* Node C is configured with the label ``nvidia.com/gpu.workload.config=vm-vgpu`` and configured to run virtual machines with vGPU.
3939

40-
The GPU operator will deploy the following software components on each node:
40+
The GPU Operator will deploy the following software components on each node:
4141

4242
* Node A receives the following software components:
4343
* ``NVIDIA Datacenter Driver`` - to install the driver

gpu-operator/gpu-operator-mig.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ Perform the following steps to install the Operator and configure MIG:
102102
Known Issue: For drivers 570.124.06, 570.133.20, 570.148.08, and 570.158.01,
103103
GPU workloads cannot be scheduled on nodes that have a mix of MIG slices and full GPUs.
104104
This manifests as GPU pods getting stuck indefinitely in the ``Pending`` state.
105-
It's recommended that you downgrade the driver to version 570.86.15 to work around this issue.
105+
NVIDIA recommends that you downgrade the driver to version 570.86.15 to work around this issue.
106106
For more detailed information, see GitHub issue https://github.com/NVIDIA/gpu-operator/issues/1361.
107107

108108

gpu-operator/install-gpu-operator-air-gapped.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -246,7 +246,7 @@ Sample of ``values.yaml`` for GPU Operator v1.9.0:
246246
Local Package Repository
247247
************************
248248

249-
The ``driver`` container deployed as part of the GPU operator requires certain packages to be available as part of the
249+
The ``driver`` container deployed as part of the GPU Operator requires certain packages to be available as part of the
250250
driver installation. In restricted internet access or air-gapped installations, users are required to create a
251251
local mirror repository for their OS distribution and make the following packages available:
252252

gpu-operator/install-gpu-operator-outdated-kernels.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ On GPU nodes where the running kernel is not the latest, the ``driver`` containe
1212
see the following error message: ``Could not resolve Linux kernel version``.
1313

1414
In general, upgrading your system to the latest kernel should fix this issue. But if this is not an option, the following is a
15-
workaround to successfully deploy the GPU operator when GPU nodes in your cluster may not be running the latest kernel.
15+
workaround to successfully deploy the GPU Operator when GPU nodes in your cluster may not be running the latest kernel.
1616

1717
Add Archived Package Repositories
1818
=================================

gpu-operator/life-cycle-policy.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -91,8 +91,9 @@ Refer to :ref:`Upgrading the NVIDIA GPU Operator` for more information.
9191
- ${version}
9292

9393
* - NVIDIA GPU Driver |ki|_
94-
- | `575.57.08 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-575-57-08/index.html>`_
95-
| `570.172.08 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-570-172-08/index.html>`_ (default, recommended)
94+
- | `580.65.06 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-580-65-06/index.html>`_ (recommended)
95+
| `575.57.08 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-575-57-08/index.html>`_
96+
| `570.172.08 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-570-172-08/index.html>`_ (default)
9697
| `570.158.01 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-570-158-01/index.html>`_
9798
| `570.148.08 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-570-148-08/index.html>`_
9899
| `535.261.03 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-261-03/index.html>`_
@@ -152,7 +153,7 @@ Refer to :ref:`Upgrading the NVIDIA GPU Operator` for more information.
152153
Known Issue: For drivers 570.124.06, 570.133.20, 570.148.08, and 570.158.01,
153154
GPU workloads cannot be scheduled on nodes that have a mix of MIG slices and full GPUs.
154155
This manifests as GPU pods getting stuck indefinitely in the ``Pending`` state.
155-
It's recommended that you downgrade the driver to version 570.86.15 to work around this issue.
156+
NVIDIA recommends that you downgrade the driver to version 570.86.15 to work around this issue.
156157
For more detailed information, see GitHub issue https://github.com/NVIDIA/gpu-operator/issues/1361.
157158

158159

gpu-operator/overview.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ configuration of multiple software components such as drivers, container runtime
3131
and prone to errors. The NVIDIA GPU Operator uses the `operator framework <https://coreos.com/blog/introducing-operator-framework>`_
3232
within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA),
3333
Kubernetes device plugin for GPUs, the `NVIDIA Container Toolkit <https://github.com/NVIDIA/nvidia-container-toolkit>`_,
34-
automatic node labelling using `GFD <https://github.com/NVIDIA/gpu-feature-discovery>`_, `DCGM <https://developer.nvidia.com/dcgm>`_ based monitoring and others.
34+
automatic node labeling using `GFD <https://github.com/NVIDIA/gpu-feature-discovery>`_, `DCGM <https://developer.nvidia.com/dcgm>`_ based monitoring and others.
3535

3636

3737
.. card:: Red Hat OpenShift Container Platform

0 commit comments

Comments
 (0)