-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Batch inference DDP + zero stage 3 = inference code hangs #7128
Comments
@ShengYun-Peng can you share the system you're working on, the transformers and deepspeed versions, and the full error message please? |
Thanks! Here is the full error message if I run the code with {"tp_size": 2}
Basically the code hangs out after printing Init COMPLETE. In comparison, below is the the log message running the same code but with tp_size=1
I'm running the code on DGX A100 with deepspeed==0.16.3, transformers==4.48.3, torch==2.5.1. |
Thanks! The opt-1.3b model explicitly defines a pad_token in its tokenizer_config: link, so there's no need to manually set tokenizer.pad_token = tokenizer.eos_token. I printed tokenizer.pad_token in the code above, and it returned |
I ran the batch inference code with deepspeed generation, not the vllm one. The code hangs while I set zero stage = 3. I created a minimal code snippet for you to debug the error.
Run the code with
The code should run without error because it's DDP.
Now, if we change set "tensor_parallel": {"tp_size": 1} -> "tensor_parallel": {"tp_size": 2} and rerun the code. The code hangs forever. Note that the bug happens when DDP + TP are enabled.
The text was updated successfully, but these errors were encountered: