Skip to content

Releases: NVIDIA/cloudai

v1.2.beta2

21 Feb 08:47
d0c000a
Compare
Choose a tag to compare
v1.2.beta2 Pre-release
Pre-release

What's Changed

Full Changelog: v1.2.beta1...v1.2.beta2

v1.2.beta1

10 Feb 14:19
e7b897b
Compare
Choose a tag to compare
v1.2.beta1 Pre-release
Pre-release

Highlights

Changes in mounts for Slurm runs

Documentation is available in the User Guide.

Default mount

Test output directory <output_path>/<scenario_name_with_timestamp>/<test_name>/<iteration> (for ex. results/scenario_2024-06-18_17-40-13/Tests.1/0) is mounted as /cloudai_run_results.

Custom mounts

Users can now specify custom mounts via Test configuration:

extra_container_mounts = [
  "/path/to/mount1:/path/in/container1",
  "/path/to/mount2:/path/in/container2"
]

Git repo mounts

Arbitrary amount of Git repositories can be cloned as part of cloudai install and the mounted into containers.

[[git_repos]]
url = "https://github.com/NVIDIA/cloudai"
commit = "sha1"
mount_as = "/work"

[[git_repos]]
url = "https://github.com/NVIDIA/cloudai-new"
commit = "sha1"
mount_as = "/opt/new"

Configuration is done via Test TOML file.

Sbatch custom arguments

Users can now specify custom sbatch arguments via System configuration:

extra_sbatch_args = [
  "--section=4",
  "--other-arg val"
]

The snippet above will result in the following sbatch directives added in addition to others:

#SBATCH --section=4
#SBATCH --other-arg val

More info.

What's Changed

Full Changelog: v1.1.0...v1.2.beta1

v1.1.0

08 Feb 00:32
0bb44ce
Compare
Choose a tag to compare

CloudAI v1.1 (GA) release notes

Compatibility

CloudAI v1.1 has been tested with: PyTorch/JAX NGC Container 24.05, NCCL 2.19/2.21, and SPC-X 1.1.

Key Features and Enhancements:

  • First GA release with verification and QA testing
  • Verifiable test schemas using Pydantic
  • Use subcommands for command line options for better user experience

What’s next

  • Support for GB200 and GB300 systems
  • General availability - CloudAI Configurator and Gym
  • Support wide range of Nemo 2.0 models
  • Deprecate PAXML JAXToolbox and replace it with MaxText JAXToolbox.

v1.1.rc1

28 Jan 19:45
0bb44ce
Compare
Choose a tag to compare
v1.1.rc1 Pre-release
Pre-release

What's Changed

New Contributors

Full Changelog: v1.1.beta21...v1.1.rc1

v1.1.beta21

22 Jan 12:18
dbeadcc
Compare
Choose a tag to compare
v1.1.beta21 Pre-release
Pre-release

What's Changed

Full Changelog: v1.1.beta20...v1.1.beta21

v1.1.beta20

21 Jan 21:47
2c26087
Compare
Choose a tag to compare
v1.1.beta20 Pre-release
Pre-release

What's Changed

Full Changelog: v1.1.beta19...v1.1.beta20

v1.1.beta19

17 Jan 10:24
5ddba08
Compare
Choose a tag to compare
v1.1.beta19 Pre-release
Pre-release

What's Changed

  • Unified handling for install and output folders by @amaslenn in #342

Full Changelog: v1.1.beta18...v1.1.beta19

v1.1.beta18

15 Jan 17:03
6d19737
Compare
Choose a tag to compare
v1.1.beta18 Pre-release
Pre-release

What's Changed

  • retry docker image enroot when cluster requires specifying GPU resource by @lilyw97 in #335

New Contributors

Full Changelog: v1.1.beta17...v1.1.beta18

v1.1.beta17

15 Jan 01:32
d2cc9b4
Compare
Choose a tag to compare
v1.1.beta17 Pre-release
Pre-release

What's Changed

Full Changelog: v1.1.beta16...v1.1.beta17

v1.1.beta16

14 Jan 20:23
aac7f6f
Compare
Choose a tag to compare
v1.1.beta16 Pre-release
Pre-release

What's Changed

Full Changelog: v1.1.beta15...v1.1.beta16