[None][fix] add nvfp4 to --kv_cache_dtype choices in quantize.py#14491
[None][fix] add nvfp4 to --kv_cache_dtype choices in quantize.py#14491fuergaosi233 wants to merge 1 commit into
Conversation
quantize_by_modelopt.py already supports nvfp4 as a valid KV cache quantization format (via KV_QUANT_CFG_CHOICES["nvfp4"] = "NVFP4_KV_CFG"), but the argparse choices in examples/quantization/quantize.py only listed ["int8", "fp8", None]. This caused argparse to reject --kv_cache_dtype nvfp4 with a "invalid choice" error even though the underlying code supports it. Add "nvfp4" to the choices list and update the help text to note that nvfp4 KV cache requires --qformat fp8. Signed-off-by: holegots <fuergaosi@gmail.com>
📝 WalkthroughWalkthroughThe ChangesKV Cache Dtype CLI Option
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/quantization/quantize.py`:
- Around line 111-113: After parsing args, add a post-parse guard that enforces
the documented constraint: if args.kv_cache_dtype == "nvfp4" and args.qformat !=
"fp8", call parser.error (or raise argparse.ArgumentError) with a clear message
and exit; place this check immediately after parse_args() in quantize.py
(referencing args.kv_cache_dtype and args.qformat) so invalid combos like
--kv_cache_dtype nvfp4 with --qformat int8_sq fail fast.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1531ed05-2d02-40a4-9a87-a321af1a30c0
📒 Files selected for processing (1)
examples/quantization/quantize.py
| help="KV Cache dtype. Use 'nvfp4' only with --qformat fp8.", | ||
| default=None, | ||
| choices=["int8", "fp8", None]) | ||
| choices=["int8", "fp8", "nvfp4", None]) |
There was a problem hiding this comment.
Enforce the documented nvfp4/qformat constraint in code, not just help text.
Line 111 documents a hard rule, but argparse currently still accepts invalid combos (e.g., --kv_cache_dtype nvfp4 --qformat int8_sq). Add an explicit post-parse guard so invalid inputs fail fast.
Suggested fix
@@
args = parser.parse_args()
+
+ if args.kv_cache_dtype == "nvfp4" and args.qformat != "fp8":
+ parser.error("--kv_cache_dtype nvfp4 requires --qformat fp8")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/quantization/quantize.py` around lines 111 - 113, After parsing
args, add a post-parse guard that enforces the documented constraint: if
args.kv_cache_dtype == "nvfp4" and args.qformat != "fp8", call parser.error (or
raise argparse.ArgumentError) with a clear message and exit; place this check
immediately after parse_args() in quantize.py (referencing args.kv_cache_dtype
and args.qformat) so invalid combos like --kv_cache_dtype nvfp4 with --qformat
int8_sq fail fast.
Summary
tensorrt_llm/quantization/quantize_by_modelopt.pyalready supportsnvfp4as a valid KV cache quantization format viaKV_QUANT_CFG_CHOICES["nvfp4"] = "NVFP4_KV_CFG"examples/quantization/quantize.pyonly listed["int8", "fp8", None]in the--kv_cache_dtypeargparse choices--kv_cache_dtype nvfp4with an "invalid choice" error even though the underlying quantization code fully supports itRoot Cause
The CLI wrapper was not updated when
nvfp4KV cache support was added to the quantization backend.Code evidence:
tensorrt_llm/quantization/quantize_by_modelopt.py:108-110:KV_QUANT_CFG_CHOICES = {"fp8": "FP8_KV_CFG", "nvfp4": "NVFP4_KV_CFG"}tensorrt_llm/quantization/mode.py:KV_CACHE_QUANT_ALGO_LIST = [QuantAlgo.FP8, QuantAlgo.INT8, QuantAlgo.NVFP4]examples/quantization/quantize.py:113:choices=["int8", "fp8", None]— missing"nvfp4"Changes
"nvfp4"to the--kv_cache_dtypechoices--qformat fp8Test plan
python quantize.py --kv_cache_dtype nvfp4 --qformat fp8 ...is accepted by argparseSummary by CodeRabbit
New Features
nvfp4as a KV cache data type option in quantization examples.Documentation
nvfp4usage with quantization formats.