Skip to content

Conversation

@rsxdalv
Copy link
Contributor

@rsxdalv rsxdalv commented May 21, 2025

#32 for @nullnuller

This is an initial draft, further testing by different users is required. The code can also be written more elegantly, but this should be a robust way to see if it works and how good is the performance of this approach.

The code aims to 1. bring input tensors to self.ace_step_transformer.device and then bring the output tensors to self.device for each { ace_step_transformer, music_dcae, text_encoder_model }
It modifies cpu offload to follow the rules of self.device_map

Example device_map:

device_map = {
    'ace_step_transformer': "cuda:0",
    'text_encoder_model': "cuda:1",
    'music_dcae': "cuda:1",
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant