-
Notifications
You must be signed in to change notification settings - Fork 29.2k
enable cpu offloading for Bark on xpu #37599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: YAO Matrix <[email protected]>
Signed-off-by: YAO Matrix <[email protected]>
Signed-off-by: YAO Matrix <[email protected]>
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
@@ -1056,7 +1057,8 @@ def processor(self): | |||
def inputs(self): | |||
input_ids = self.processor("In the light of the moon, a little egg lay on a leaf", voice_preset="en_speaker_6") | |||
|
|||
input_ids = input_ids.to(torch_device) | |||
for k, v in input_ids.items(): | |||
input_ids[k] = v.to(torch_device) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this because input_ids
is a dict, and prior code just use to(torch_device)
to move all of its items to device. In my env, both XPU and A100 will fail, say one tensor is on CPU(history_prompt
in input_ids
dict) and another is on cuda:0 or xpu:0(it's embedding table), i found the to()
in original code only moves some of the items into device but others still on cpu, I changed the code to move item-by-item specifically, and then it pass on both XPU and CUDA.
device_type = "cuda" | ||
if is_torch_accelerator_available(): | ||
device_type = torch.accelerator.current_accelerator().type | ||
device = torch.device(f"{device_type}:{gpu_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use torch.accelerator
to detect device runtime when it's available, else fallback to old value which is "cuda".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot ! sounds good, left few nits
Signed-off-by: YAO Matrix <[email protected]>
Signed-off-by: YAO Matrix <[email protected]>
Signed-off-by: YAO Matrix <[email protected]>
@ydshieh , added, pls help review, thx |
* enable cpu offloading of bark modeling on XPU Signed-off-by: YAO Matrix <[email protected]> * remove debug print Signed-off-by: YAO Matrix <[email protected]> * fix style Signed-off-by: YAO Matrix <[email protected]> * fix review comments Signed-off-by: YAO Matrix <[email protected]> * enhance test Signed-off-by: YAO Matrix <[email protected]> * update * add deprecate message Signed-off-by: YAO Matrix <[email protected]> * update * update * trigger CI --------- Signed-off-by: YAO Matrix <[email protected]> Co-authored-by: ydshieh <[email protected]>
command
pytest -rA tests/models/bark/test_modeling_bark.py::BarkModelIntegrationTests::test_generate_end_to_end_with_offload
after this PR
PASSED