Trtllm-pytorch doesn't support `n` > 1

### System Info

Trtllm version: v0.20.0


### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Demo script:

```python
from tensorrt_llm._torch import LLM
from tensorrt_llm import SamplingParams

from tensorrt_llm._torch.pyexecutor.config import PyTorchConfig


def main():
    # Model could accept HF model name, a path to local HF model,
    # or TensorRT Model Optimizer's quantized checkpoints like nvidia/Llama-3.1-8B-Instruct-FP8 on HF.
    model_path = "Qwen/Qwen3-30B-A3B"
    # model_path = "Qwen/Qwen3-235B-A22B"
    tp = 2
    pytorch_backend_config = PyTorchConfig(disable_overlap_scheduler=True)
    llm = LLM(
        model=model_path,
        tensor_parallel_size=tp,
        moe_tensor_parallel_size=1,
        moe_expert_parallel_size=tp,
        max_num_tokens=1160,
        max_batch_size=161,
        free_gpu_memory_fraction=0.8,
        pytorch_backend_config=pytorch_backend_config,
    )

    # Sample prompts.
    prompts = [
        "Hello, my name is",
    ]

    # Create a sampling params.
    sampling_params = SamplingParams(temperature=0.8, top_p=0.95, n=2)

    for output in llm.generate(prompts, sampling_params):
        print(f"Prompt: {output.prompt!r}, Generated text: {output.outputs[0].text!r}")


if __name__ == "__main__":
    main()

```

### Expected behavior

Samping 2 results for each prompt

### actual behavior

Errors happened.

```
Exception in thread Thread-4 (_executor_loop):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Exception in thread Thread-4 (_executor_loop):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/root/nvda/TensorRT-LLM/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 836, in _executor_loop
    assert scheduled_batch.batch_size > 0, (
AssertionError: fail to schedule any pending request, probably run out of resource.
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/root/nvda/TensorRT-LLM/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 836, in _executor_loop
    assert scheduled_batch.batch_size > 0, (
AssertionError: fail to schedule any pending request, probably run out of resource.
```

### additional notes

No more.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trtllm-pytorch doesn't support `n` > 1 #6406

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Trtllm-pytorch doesn't support n > 1 #6406

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Trtllm-pytorch doesn't support `n` > 1 #6406