[None][fix] add nvfp4 to --kv_cache_dtype choices in quantize.py by fuergaosi233 · Pull Request #14491 · NVIDIA/TensorRT-LLM

fuergaosi233 · 2026-05-23T19:23:48Z

Summary

tensorrt_llm/quantization/quantize_by_modelopt.py already supports nvfp4 as a valid KV cache quantization format via KV_QUANT_CFG_CHOICES["nvfp4"] = "NVFP4_KV_CFG"
However, examples/quantization/quantize.py only listed ["int8", "fp8", None] in the --kv_cache_dtype argparse choices
This caused argparse to reject --kv_cache_dtype nvfp4 with an "invalid choice" error even though the underlying quantization code fully supports it

Root Cause

The CLI wrapper was not updated when nvfp4 KV cache support was added to the quantization backend.

Code evidence:

tensorrt_llm/quantization/quantize_by_modelopt.py:108-110: KV_QUANT_CFG_CHOICES = {"fp8": "FP8_KV_CFG", "nvfp4": "NVFP4_KV_CFG"}
tensorrt_llm/quantization/mode.py: KV_CACHE_QUANT_ALGO_LIST = [QuantAlgo.FP8, QuantAlgo.INT8, QuantAlgo.NVFP4]
examples/quantization/quantize.py:113: choices=["int8", "fp8", None] — missing "nvfp4"

Changes

Added "nvfp4" to the --kv_cache_dtype choices
Updated the help text to note that nvfp4 KV cache requires --qformat fp8

Test plan

Verify python quantize.py --kv_cache_dtype nvfp4 --qformat fp8 ... is accepted by argparse
Verify that the nvfp4 KV cache quantization produces valid output checkpoints on Hopper/Blackwell GPUs

Summary by CodeRabbit

New Features
- Added support for nvfp4 as a KV cache data type option in quantization examples.
Documentation
- Updated help text to clarify constraints on nvfp4 usage with quantization formats.

quantize_by_modelopt.py already supports nvfp4 as a valid KV cache quantization format (via KV_QUANT_CFG_CHOICES["nvfp4"] = "NVFP4_KV_CFG"), but the argparse choices in examples/quantization/quantize.py only listed ["int8", "fp8", None]. This caused argparse to reject --kv_cache_dtype nvfp4 with a "invalid choice" error even though the underlying code supports it. Add "nvfp4" to the choices list and update the help text to note that nvfp4 KV cache requires --qformat fp8. Signed-off-by: holegots <fuergaosi@gmail.com>

coderabbitai · 2026-05-23T19:27:06Z

📝 Walkthrough

Walkthrough

The --kv_cache_dtype CLI argument in the quantization example script was updated to support a new nvfp4 option. Help text now documents the constraint that nvfp4 requires --qformat fp8, and the choices list was expanded to include nvfp4.

Changes

KV Cache Dtype CLI Option

Layer / File(s)	Summary
KV cache dtype CLI argument update `examples/quantization/quantize.py`	Help text for `--kv_cache_dtype` now documents that `nvfp4` requires `--qformat fp8`, and `nvfp4` was added to the allowed choices alongside `int8`, `fp8`, and `None`.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Suggested reviewers

brb-nv

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: adding nvfp4 to the --kv_cache_dtype choices in quantize.py, which is the core modification in this PR.
Description check	✅ Passed	The PR description provides clear context (root cause, evidence, changes made, and test plan) but lacks explicit coverage of the PR checklist items and structured sections that match the template format.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/quantization/quantize.py`:
- Around line 111-113: After parsing args, add a post-parse guard that enforces
the documented constraint: if args.kv_cache_dtype == "nvfp4" and args.qformat !=
"fp8", call parser.error (or raise argparse.ArgumentError) with a clear message
and exit; place this check immediately after parse_args() in quantize.py
(referencing args.kv_cache_dtype and args.qformat) so invalid combos like
--kv_cache_dtype nvfp4 with --qformat int8_sq fail fast.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1531ed05-2d02-40a4-9a87-a321af1a30c0

📥 Commits

Reviewing files that changed from the base of the PR and between c5b0372 and 8108390.

📒 Files selected for processing (1)

examples/quantization/quantize.py

coderabbitai · 2026-05-23T19:27:09Z

+                        help="KV Cache dtype. Use 'nvfp4' only with --qformat fp8.",
                        default=None,
-                        choices=["int8", "fp8", None])
+                        choices=["int8", "fp8", "nvfp4", None])


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce the documented nvfp4/qformat constraint in code, not just help text.

Line 111 documents a hard rule, but argparse currently still accepts invalid combos (e.g., --kv_cache_dtype nvfp4 --qformat int8_sq). Add an explicit post-parse guard so invalid inputs fail fast.

Suggested fix

@@ args = parser.parse_args() + + if args.kv_cache_dtype == "nvfp4" and args.qformat != "fp8": + parser.error("--kv_cache_dtype nvfp4 requires --qformat fp8")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/quantization/quantize.py` around lines 111 - 113, After parsing args, add a post-parse guard that enforces the documented constraint: if args.kv_cache_dtype == "nvfp4" and args.qformat != "fp8", call parser.error (or raise argparse.ArgumentError) with a clear message and exit; place this check immediately after parse_args() in quantize.py (referencing args.kv_cache_dtype and args.qformat) so invalid combos like --kv_cache_dtype nvfp4 with --qformat int8_sq fail fast.

fuergaosi233 requested a review from a team as a code owner May 23, 2026 19:23

fuergaosi233 requested review from chang-l and laikhtewari May 23, 2026 19:23

github-actions Bot assigned fuergaosi233 May 23, 2026

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][fix] add nvfp4 to --kv_cache_dtype choices in quantize.py#14491

[None][fix] add nvfp4 to --kv_cache_dtype choices in quantize.py#14491
fuergaosi233 wants to merge 1 commit into
NVIDIA:mainfrom
fuergaosi233:fix/quantize-nvfp4-kv-cache-dtype

fuergaosi233 commented May 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 23, 2026

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fuergaosi233 commented May 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Changes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 23, 2026

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fuergaosi233 commented May 23, 2026 •

edited by coderabbitai Bot

Loading