|
1 |
| -This file is a placeholder on the primary development branch of the |
2 |
| -OFI NCCL Plugin so that "make dist" works properly. Release branches |
3 |
| -will have an accurate release history in this location, and each |
4 |
| -release tarball will also have up to date release notes. |
| 1 | +# AWS OFI NCCL Release notes |
5 | 2 |
|
6 |
| -If you're looking for Plugin releases, please see the [Releases |
7 |
| -Page](https://github.com/aws/aws-ofi-nccl/releases). |
| 3 | +# Supported Distributions |
| 4 | + |
| 5 | +- Amazon Linux 2 |
| 6 | +- Amazon Linux 2023 |
| 7 | +- Ubuntu 20.04 LTS, 22.04 LTS. |
| 8 | + |
| 9 | +For releases before v1.6.0, we generally created releases from two separate |
| 10 | +branches, an AWS-specific branch and a general release branch. With v1.6.0, we |
| 11 | +have unified the code into a single branch, and made the AWS-specific parts a |
| 12 | +compile-time option. When a feature (or entire release) only supports one of |
| 13 | +the two variants, we note that in the release notes. |
| 14 | + |
| 15 | +# v1.13.0-aws (2024-11-18) |
| 16 | + |
| 17 | +This release is intended only for use on AWS P\* instances. A general release |
| 18 | +that supports other libfabric networks may be made in the near future. |
| 19 | + |
| 20 | +With this release, building with platform-aws requires |
| 21 | +[1.22.0amzn4.0](https://github.com/aws/libfabric/commits/1.22.0amzn4.0/) |
| 22 | +or greater. AWS customers are generally recommended to track |
| 23 | +[the latest-available EFA Installer](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-verify.html) |
| 24 | +for performance improvements and bug fixes. |
| 25 | + |
| 26 | +The 1.13.x release series supports |
| 27 | +[NCCL 2.23.4-1](https://github.com/NVIDIA/nccl/releases/tag/v2.23.4-1) |
| 28 | +while maintaining backward compatibility with older NCCL versions |
| 29 | +([NCCL v2.17.1](https://github.com/NVIDIA/nccl/releases/tag/v2.17.1-1) and later). |
| 30 | + |
| 31 | +New features: |
| 32 | + |
| 33 | +- AWS `P5en` platform support was added. |
| 34 | + |
| 35 | +- support was added for the NCCL v3 tuner API. The tuner now supports multiple |
| 36 | + platforms and supports multiple collectives. |
| 37 | + |
| 38 | +- Scheduling improvements were made to the plugin RDMA protocol. In multirail |
| 39 | + configurations, this is expected to balance traffic more optimally. |
| 40 | + |
| 41 | +- dmabuf memory registration support was added. Users facing problems with |
| 42 | + dmabuf may disable dmabuf with `OFI_NCCL_DISABLE_DMABUF=1`. |
| 43 | + |
| 44 | +Breaking changes: |
| 45 | + |
| 46 | +- As mentioned above, building with support for platform-aws now requires |
| 47 | + libfabric version 1.22.0amzn4.0 or greater. |
| 48 | + |
| 49 | +- Under CUDA, the plugin now statically links the CUDA runtime by default. |
| 50 | + Packagers preferring to dynamically link CUDA may pass |
| 51 | + `--enable-cudart-dynamic` at configure time to disable this. |
0 commit comments