diff --git a/docs/source/developer_guide/feature_guide/ACL_Graph.md b/docs/source/developer_guide/feature_guide/ACL_Graph.md
index c463a7bbb66..84f8beb95da 100644
--- a/docs/source/developer_guide/feature_guide/ACL_Graph.md
+++ b/docs/source/developer_guide/feature_guide/ACL_Graph.md
@@ -55,6 +55,8 @@ Obviously, we can solve this problem by capturing the biggest shape and padding
 
 ```
 
+In vLLM, these thresholds are set by `cudagraph_capture_sizes`. The default capture sizes are like `[1,2,4,8,16,24,32,...,max_capture_size]`. You can customize capture sizes to get fine-grained control over performance. For example, we can set `cudagraph_capture_sizes` as `[1,2,4,6,12,18]` when running Qwen3-235B on decode node in large ep.
+
 ### Piecewise and Full graph
 
 Due to the increasing complexity of the attention layer in current LLM, we can't ensure all types of attention can run in graph. In MLA, prefill_tokens and decode_tokens have different calculation method, so when a batch has both prefills and decodes in MLA, graph mode is difficult to handle this situation.