Skip to content

Conversation

@Rypo
Copy link

@Rypo Rypo commented Nov 28, 2024

Changes

  • Adds an option from_pretrained(low_cpu_mem_usage=True) (akin to the transformers implementation, but greatly simplified) to OmniGen and OmniGenPipeline
  • Use accelerate init_empty_weights context manager when initializing the model. This avoids slow CPU weight initialization, particularly during self.initialize_weights().

These weights are immediately overwritten when the state_dict is loaded. This means we can safely bypass initialization without consequence.

Additionally, this can achieved with no additional libraries beyond those in requirements.txt. As such, I set the default as low_cpu_mem_usage=True.

Results

From my tests, this change:

  • Reduces the initial pipeline load time by 4-5x and
  • Decreases peak initial RAM usage by 10-15GB

Cold Load

New process + memory freed

low_cpu_mem_usage avg load time RAM usage
True 9.53s 18GB
False 41.56s 28GB

Hot Load

pipe.from_pretrained...; del pipe; gc.collect(); pipe.from_pretrained...

low_cpu_mem_usage avg load time RAM usage
True 5.07s 18GB
False 36.64s 33GB

This is the first of 3 PRs I'm issuing to improve performance/fix errors. I've tried to keep each incremental change as small in scope as possible. PRs: 1. This, 2. #150, 3. #151

Rypo added 2 commits November 25, 2024 19:39
Prevents slow CPU initialization of model weights on load by using accelerate `init_empty_weights`.

Completely compatible with from_pretrained since weights will always be overwritten by state_dict

fixes VectorSpaceLab#72
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant