Releases · NVIDIA/Model-Optimizer

17 Mar 06:16

0.43.0rc1

00fa5bd

0.43.0rc1 Pre-release

Pre-release

Install the 0.43.0rc1 pre-release version using

pip install nvidia-modelopt==0.43.0rc1 --extra-index-url https://pypi.nvidia.com

Assets 3

17 Mar 05:45

kevalmorabia97

0.43.0rc0

e4df91b

0.43.0rc0 Pre-release

Pre-release

Install the 0.43.0rc0 pre-release version using

pip install nvidia-modelopt[all]==0.43.0rc0 --extra-index-url https://pypi.nvidia.com

Assets 3

09 Mar 20:31

kevalmorabia97

0.42.0

e2a4a8b

ModelOpt 0.42.0 Release Latest

Latest

Bug Fixes

Fix calibration data generation with multiple samples in the ONNX workflow.

New Features

Added a standalone type inference option (--use_standalone_type_inference) to ONNX AutoCast as an experimental alternative to ONNX's infer_shapes. This option performs type-only inference without shape inference, which can help when shape inference fails or when you want to avoid extra shape inference overhead.
Added quantization support for the Kimi K2 Thinking model from the original int4 checkpoint.
Introduced support for params constraint-based automatic neural architecture search in Minitron pruning (mcore_minitron) as an alternative to manual pruning with export_config. See examples/pruning/README.md for more details.
Example added for Minitron pruning using the Megatron-Bridge framework, including advanced pruning usage with params-constraint-based pruning and a new distillation example. See examples/megatron_bridge/README.md.
Supported calibration data with multiple samples in .npz format in the ONNX Autocast workflow.
Added the --opset option to the ONNX quantization CLI to specify the target opset version for the quantized model.
Enabled support for context parallelism in Eagle speculative decoding for both HuggingFace and Megatron Core models.
Added unified Hugging Face export support for diffusers pipelines/components.
Added support for LTX-2 and Wan2.2 (T2V) in the diffusers quantization workflow.
Provided PTQ support for GLM-4.7, including loading MTP layer weights from a separate mtp.safetensors file and supporting export as-is.
Added support for image-text data calibration in PTQ for Nemotron VL models.
Enabled advanced weight scale search for NVFP4 quantization and its export pathway.
Provided PTQ support for Nemotron Parse.
Added distillation support for LTX-2. See examples/diffusers/distillation/README.md for more details.

Assets 3

28 Feb 18:32

kevalmorabia97

0.42.0rc2

eaf5d7e

0.42.0rc2 Pre-release

Pre-release

Install the 0.42.0rc2 pre-release version using

pip install nvidia-modelopt[all]==0.42.0rc2 --extra-index-url https://pypi.nvidia.com

Assets 3

21 Feb 14:50

kevalmorabia97

0.42.0rc1

f08a65f

0.42.0rc1 Pre-release

Pre-release

Install the 0.42.0rc1 pre-release version using

pip install nvidia-modelopt==0.42.0rc1 --extra-index-url https://pypi.nvidia.com

Assets 3

04 Feb 05:34

kevalmorabia97

0.42.0rc0

87237e7

0.42.0rc0 Pre-release

Pre-release

Install the 0.42.0rc0 pre-release version using

pip install nvidia-modelopt==0.42.0rc0 --extra-index-url https://pypi.nvidia.com

Assets 3

20 Jan 17:10

kevalmorabia97

0.41.0

d39cf45

ModelOpt 0.41.0 Release

Bug Fixes

Fix Megatron KV Cache quantization checkpoint restore for QAT/QAD (device placement, amax sync across DP/TP, flash_decode compatibility).

New Features

Add support for Transformer Engine quantization for Megatron Core models.
Add support for Qwen3-Next model quantization.
Add support for dynamically linked TensorRT plugins in the ONNX quantization workflow.
Add support for KV Cache Quantization for vLLM FakeQuant PTQ script. See examples/vllm_serve/README.md for more details.
Add support for subgraphs in ONNX autocast.
Add support for parallel draft heads in Eagle speculative decoding.
Add support to enable custom emulated quantization backend. See register_quant_backend for more details. See an example in tests/unit/torch/quantization/test_custom_backend.py.
Add examples/llm_qad for QAD training with Megatron-LM.

Deprecations

Deprecate num_query_groups parameter in Minitron pruning (mcore_minitron). You can use ModelOpt 0.40.0 or earlier instead if you need to prune it.

Backward Breaking Changes

Remove torchprofile as a default dependency from ModelOpt as it's used only for flops-based FastNAS pruning (computer vision models). It can be installed separately if needed.

Assets 3