Skip to content

[None][fix] add nvfp4 to --kv_cache_dtype choices in quantize.py#14491

Open
fuergaosi233 wants to merge 1 commit into
NVIDIA:mainfrom
fuergaosi233:fix/quantize-nvfp4-kv-cache-dtype
Open

[None][fix] add nvfp4 to --kv_cache_dtype choices in quantize.py#14491
fuergaosi233 wants to merge 1 commit into
NVIDIA:mainfrom
fuergaosi233:fix/quantize-nvfp4-kv-cache-dtype

Conversation

@fuergaosi233
Copy link
Copy Markdown

@fuergaosi233 fuergaosi233 commented May 23, 2026

Summary

  • tensorrt_llm/quantization/quantize_by_modelopt.py already supports nvfp4 as a valid KV cache quantization format via KV_QUANT_CFG_CHOICES["nvfp4"] = "NVFP4_KV_CFG"
  • However, examples/quantization/quantize.py only listed ["int8", "fp8", None] in the --kv_cache_dtype argparse choices
  • This caused argparse to reject --kv_cache_dtype nvfp4 with an "invalid choice" error even though the underlying quantization code fully supports it

Root Cause

The CLI wrapper was not updated when nvfp4 KV cache support was added to the quantization backend.

Code evidence:

  • tensorrt_llm/quantization/quantize_by_modelopt.py:108-110: KV_QUANT_CFG_CHOICES = {"fp8": "FP8_KV_CFG", "nvfp4": "NVFP4_KV_CFG"}
  • tensorrt_llm/quantization/mode.py: KV_CACHE_QUANT_ALGO_LIST = [QuantAlgo.FP8, QuantAlgo.INT8, QuantAlgo.NVFP4]
  • examples/quantization/quantize.py:113: choices=["int8", "fp8", None] — missing "nvfp4"

Changes

  • Added "nvfp4" to the --kv_cache_dtype choices
  • Updated the help text to note that nvfp4 KV cache requires --qformat fp8

Test plan

  • Verify python quantize.py --kv_cache_dtype nvfp4 --qformat fp8 ... is accepted by argparse
  • Verify that the nvfp4 KV cache quantization produces valid output checkpoints on Hopper/Blackwell GPUs

Summary by CodeRabbit

  • New Features

    • Added support for nvfp4 as a KV cache data type option in quantization examples.
  • Documentation

    • Updated help text to clarify constraints on nvfp4 usage with quantization formats.

Review Change Stack

quantize_by_modelopt.py already supports nvfp4 as a valid KV cache
quantization format (via KV_QUANT_CFG_CHOICES["nvfp4"] = "NVFP4_KV_CFG"),
but the argparse choices in examples/quantization/quantize.py only
listed ["int8", "fp8", None]. This caused argparse to reject --kv_cache_dtype
nvfp4 with a "invalid choice" error even though the underlying code
supports it.

Add "nvfp4" to the choices list and update the help text to note that
nvfp4 KV cache requires --qformat fp8.

Signed-off-by: holegots <fuergaosi@gmail.com>
@fuergaosi233 fuergaosi233 requested a review from a team as a code owner May 23, 2026 19:23
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 23, 2026

📝 Walkthrough

Walkthrough

The --kv_cache_dtype CLI argument in the quantization example script was updated to support a new nvfp4 option. Help text now documents the constraint that nvfp4 requires --qformat fp8, and the choices list was expanded to include nvfp4.

Changes

KV Cache Dtype CLI Option

Layer / File(s) Summary
KV cache dtype CLI argument update
examples/quantization/quantize.py
Help text for --kv_cache_dtype now documents that nvfp4 requires --qformat fp8, and nvfp4 was added to the allowed choices alongside int8, fp8, and None.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Suggested reviewers

  • brb-nv
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding nvfp4 to the --kv_cache_dtype choices in quantize.py, which is the core modification in this PR.
Description check ✅ Passed The PR description provides clear context (root cause, evidence, changes made, and test plan) but lacks explicit coverage of the PR checklist items and structured sections that match the template format.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/quantization/quantize.py`:
- Around line 111-113: After parsing args, add a post-parse guard that enforces
the documented constraint: if args.kv_cache_dtype == "nvfp4" and args.qformat !=
"fp8", call parser.error (or raise argparse.ArgumentError) with a clear message
and exit; place this check immediately after parse_args() in quantize.py
(referencing args.kv_cache_dtype and args.qformat) so invalid combos like
--kv_cache_dtype nvfp4 with --qformat int8_sq fail fast.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1531ed05-2d02-40a4-9a87-a321af1a30c0

📥 Commits

Reviewing files that changed from the base of the PR and between c5b0372 and 8108390.

📒 Files selected for processing (1)
  • examples/quantization/quantize.py

Comment on lines +111 to +113
help="KV Cache dtype. Use 'nvfp4' only with --qformat fp8.",
default=None,
choices=["int8", "fp8", None])
choices=["int8", "fp8", "nvfp4", None])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce the documented nvfp4/qformat constraint in code, not just help text.

Line 111 documents a hard rule, but argparse currently still accepts invalid combos (e.g., --kv_cache_dtype nvfp4 --qformat int8_sq). Add an explicit post-parse guard so invalid inputs fail fast.

Suggested fix
@@
     args = parser.parse_args()
+
+    if args.kv_cache_dtype == "nvfp4" and args.qformat != "fp8":
+        parser.error("--kv_cache_dtype nvfp4 requires --qformat fp8")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/quantization/quantize.py` around lines 111 - 113, After parsing
args, add a post-parse guard that enforces the documented constraint: if
args.kv_cache_dtype == "nvfp4" and args.qformat != "fp8", call parser.error (or
raise argparse.ArgumentError) with a clear message and exit; place this check
immediately after parse_args() in quantize.py (referencing args.kv_cache_dtype
and args.qformat) so invalid combos like --kv_cache_dtype nvfp4 with --qformat
int8_sq fail fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant