What's Changed
Key New Features 🎉
- downloading libnccl2 and libnccl-dev for a3u and a4h by @rachit-google in #4680
Breaking Changes 🚨
- Allowing setting use_job_duration with non-exclusive partitions. by @arpit974 in #4696
- Add multi-network support in TPU v6e by @agrawalkhushi18 in #4723
- Update vpc and cloud_router versions in VPC network module by @kadupoornima in #4732
Module Improvements 🔨
- Refactoring in gke persistent module by @vikramvs-gg in #4618
- Migrate Kueue installation to use Helm chart by @shubpal07 in #4542
Improvements 🛠
- Update nvidia DRA driver version to v25.3.0 by @parulbajaj01 in #4670
- Updated A3-mega and A4-high Slurm blueprints to adopt nvidia add repository scirpt. by @rachit-google in #4667
- Update H4D blueprint: disable automatic updates, provide image info, and delete duplicate filestore by @Neelabh94 in #4644
- Add Managed Lustre support in gke-a4 by @parulbajaj01 in #4654
- Add Managed Lustre support in gke a3 ultra by @parulbajaj01 in #4700
- Adds an irdma health check to h4d nodes by @samskillman in #4704
- Enable Spot VM Provisioning For H4D by @LAVEEN in #4735
- Add slurm-gke blueprint by @ACW101 in #4607
Version Updates ⏫
Bug fixes 🐞
- Remove superfluous addition of chs logs to cloud ops config by @abbas1902 in #4679
- Adding "datacenter-gpu-manager-4-dev" as an additional installation in A* YAML files. by @Neelabh94 in #4623
- minor bug fix on MFT version comparison by @ljqg in #4689
- Fix inconsistent plan on Slurm cluster reconfigure by @wiktorn in #4538
- Update process to filter out starting comments in a source yaml file by @SwarnaBharathiMantena in #4707
- Fix gke build failures by @annuay-google in #4708
- Update machine-leaning/a3-ultragpu-8g/nemo-framework to fix segmentation fault error by @SwarnaBharathiMantena in #4725
New Contributors
- @mufaqam-gcl made their first contribution in #4688
- @wtempel made their first contribution in #4705
- @nikosavola made their first contribution in #4720
- @ACW101 made their first contribution in #4607
Full Changelog: v1.67.0...v1.68.0