@@ -12,6 +12,35 @@ have unified the code into a single branch, and made the AWS-specific parts a
12
12
compile-time option. When a feature (or entire release) only supports one of
13
13
the two variants, we note that in the release notes.
14
14
15
+ # v1.13.2-aws (2024-12-06)
16
+
17
+ This release is intended only for use on AWS P* instances. A general release
18
+ that supports other libfabric networks may be made in the near future.
19
+
20
+ With this release, building with platform-aws requires
21
+ [ 1.22.0amzn4.0] ( https://github.com/aws/libfabric/commits/1.22.0amzn4.0/ )
22
+ or greater. AWS customers are generally recommended to track
23
+ [ the latest-available EFA Installer] ( https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-verify.html )
24
+ for performance improvements and bug fixes.
25
+
26
+ The 1.13.x release series supports
27
+ [ NCCL 2.23.4-1] ( https://github.com/NVIDIA/nccl/releases/tag/v2.23.4-1 )
28
+ while maintaining backward compatibility with older NCCL versions
29
+ ([ NCCL v2.17.1] ( https://github.com/NVIDIA/nccl/releases/tag/v2.17.1-1 ) and later).
30
+
31
+ Bug Fixes:
32
+
33
+ - Tuner Improvements:
34
+ - Fixed algorithm selection for larger ranks and message sizes.
35
+ - Re-calibrated the tuner for AllGather and ReduceScatter regions for 0x7 bitmask on P5en,
36
+ optimizing performance for larger messages.
37
+ - Added tuner support for AllGather and ReduceScatter regions for 0x0 bitmask on P5en.
38
+
39
+ - Resolved a performance issue by preventing the eager protocol when RDMA writes are in flight,
40
+ improving small AllReduce collective performance.
41
+
42
+ Note: dmabuf support is now turned off by default. Users can enable it explicitly using OFI_NCCL_DISABLE_DMABUF=0 if needed.
43
+
15
44
# v1.13.1-aws (2024-11-25)
16
45
17
46
This release is intended only for use on AWS P\* instances. A general release
0 commit comments