[BUG][Disaggregated] Wrong outputs when prefill/decode uses different tp_size 

### System Info

GPU: A100
TensorRT-LLM version: 1.0.0rc4 (I am using the [prebuilt container](https://nvidia.github.io/TensorRT-LLM/installation/containers.html))

### Who can help?

_No response_

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

I get wrong results after prefill/decode disaggregation (prefill.tp_size == 2 and decode.tp_size == 1). Here are the reproduce steps:

 Launch container by:
```bash
docker run -v $PWD:/mnt    -v /aisw:/aisw   -e EXEC_BASH=1 --net=host -v $PWD:/mnt -w /mnt  --pid=host --rm -it  --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --gpus=all nvcr.io/nvidia/tensorrt-llm/release:1.0.0rc4  /bin/bash
```
Launch the servers:
```yaml
# tp2.yaml
attn_backend: FLASHINFER
moe_config:
  backend: CUTLASS
disable_overlap_scheduler: True
cache_transceiver_config:
  backend: ucx
  max_tokens_in_buffer: 2048
tensor_parallel_size: 2
```

```yaml
# tp1.yaml
attn_backend: FLASHINFER
moe_config:
  backend: CUTLASS
disable_overlap_scheduler: True
cache_transceiver_config:
  backend: ucx
  max_tokens_in_buffer: 2048
```

```yaml
# disagg_config.yaml
hostname: 0.0.0.0
port: 9095
backend: pytorch
context_servers:
  num_instances: 1
  urls:
      - "0.0.0.0:9091"
generation_servers:
  num_instances: 1
  urls:
      - "0.0.0.0:9093"
```

```bash
HF_MODEL_DIR="Qwen3-30B-A3B"

# prefill
export CUDA_VISIBLE_DEVICES=0,1
trtllm-serve \
    $HF_MODEL_DIR \
    --host 0.0.0.0 --port 9091 \
    --kv_cache_free_gpu_memory_fraction 0.1 --backend pytorch \
    --extra_llm_api_options ./tp2.yaml &> log_ctx_0 &


# decode
export CUDA_VISIBLE_DEVICES=2
trtllm-serve \
    $HF_MODEL_DIR \
    --host 0.0.0.0 --port 9093 \
    --kv_cache_free_gpu_memory_fraction 0.1 --backend pytorch \
    --extra_llm_api_options ./tp1.yaml &> log_gen_0 &

# disaggregated server
trtllm-serve \
    disaggregated -c disagg_config.yaml &> log_proxy &
```

Send request to the disaggregated server:
```bash
# client
curl http://localhost:9095/v1/completions     -H "Content-Type: application/json"     -d '{
        "model": "./Qwen3-30B-A3B",
        "prompt": "Tell me a joke",
        "max_tokens": 128,
        "temperature": 0
    }' -w "\n"
```
I get
```bash
{"id":"cmpl-7fb5e2d87ae0465eb4d823baceb0956e","object":"text_completion","created":1753946566,"model":"./Qwen3-30B-A3B","choices":[{"index":0,"text":" about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about about","token_ids":null,"logprobs":null,"context_logits":null,"finish_reason":"length","stop_reason":null,"disaggregated_params":{"request_type":"generation_only","first_gen_tokens":[911],"ctx_request_id":2052,"encoded_opaque_state":"AQAAAAACAAAAAAAAAAIAAAAAAAAAHcMMAAAAAAAAADE3Mi4yNi40Ni45NmuWDAAAAAAAAAAxNzIuMjYuNDYuOTYBMAAAAAAAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAACAAAAAIAAAAAIAAAABAAAAAAAAAAACAAAABwAAAAACAAAA","draft_tokens":null}}],"usage":{"prompt_tokens":4,"total_tokens":132,"completion_tokens":128},"prompt_token_ids":null}
```

Send request to the prefill/decode server:
```bash
# client
curl http://localhost:9091/v1/completions     -H "Content-Type: application/json"     -d '{
        "model": "./Qwen3-30B-A3B",
        "prompt": "Tell me a joke",
        "max_tokens": 128,
        "temperature": 0
    }' -w "\n"
```
I get
```bash
{"id":"cmpl-9b294cfc206f4f228752f47617f4d52f","object":"text_completion","created":1753948266,"model":"./Qwen3-30B-A3B","choices":[{"index":0,"text":" about a cat and a dog.\n\nOkay, I need to come up with a joke about a cat and a dog. Let me think... Jokes usually have a setup and a punchline. Maybe start with something about their typical behaviors. Cats are often seen as aloof, and dogs as friendly. Maybe play on that.\n\nWhat if the cat is doing something the dog doesn't understand? Like the cat knocking things over. The dog might try to help but mess things up. Or maybe a play on words. \"Why did the cat refuse to play with the dog? Because it was too feline.\" Wait, that's a","token_ids":null,"logprobs":null,"context_logits":null,"finish_reason":"length","stop_reason":null,"disaggregated_params":null}],"usage":{"prompt_tokens":4,"total_tokens":132,"completion_tokens":128},"prompt_token_ids":null}
```
The output tokens after disaggregation are different from those generated by raw prefill/decode servers, and are weird. Maybe some bugs exsit.

In addition, the disaggregated server can generate right tokens for SOME prompts, for example
```bash
curl http://localhost:9095/v1/completions     -H "Content-Type: application/json"     -d '{
        "model": "./Qwen3-30B-A3B",
        "prompt": "NVIDIA is a great company because",
        "max_tokens": 128,
        "temperature": 0
    }' -w "\n"
```
I get
```bash
{"id":"cmpl-e7dca8af00cb46c4b913387ac853ac09","object":"text_completion","created":1753948700,"model":"./Qwen3-30B-A3B","choices":[{"index":0,"text":" it **it is a company that has been around for a long time and has a lot of experience in the field of technology**. This experience has allowed them to build a strong reputation and a loyal customer base. Additionally, the company has a **strong brand name and a wide range of products**, which makes it a **reliable and trustworthy** choice for consumers. Furthermore, the company has a **strong financial position**, which allows them to invest in research and development, ensuring that they stay at the forefront of technological innovation. Finally, the company has a **strong presence in the global market**, which allows them to reach a wide audience","token_ids":null,"logprobs":null,"context_logits":null,"finish_reason":"length","stop_reason":null,"disaggregated_params":{"request_type":"generation_only","first_gen_tokens":[432],"ctx_request_id":2055,"encoded_opaque_state":"AQAAAAACAAAAAAAAAAIAAAAAAAAAHcMMAAAAAAAAADE3Mi4yNi40Ni45NmuWDAAAAAAAAAAxNzIuMjYuNDYuOTYBMAAAAAAAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAAACAAAAAgAAAAIAAACAAAAAIAAAAAIAAAABAAAAAAAAAAACAAAABwAAAAACAAAA","draft_tokens":null}}],"usage":{"prompt_tokens":7,"total_tokens":135,"completion_tokens":128},"prompt_token_ids":null}
```


### Expected behavior

output tokens from disaggregated server are valid (and similar to those from raw prefill/decode servers).

### actual behavior

Please refer to the description in reproduction steps.

### additional notes

None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG][Disaggregated] Wrong outputs when prefill/decode uses different tp_size #6507

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG][Disaggregated] Wrong outputs when prefill/decode uses different tp_size #6507

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions