Reduce initial pipeline load time by 4-5x (1/3) #149

Rypo · 2024-11-28T02:09:22Z

Changes

Adds an option from_pretrained(low_cpu_mem_usage=True) (akin to the transformers implementation, but greatly simplified) to OmniGen and OmniGenPipeline
Use accelerate init_empty_weights context manager when initializing the model. This avoids slow CPU weight initialization, particularly during self.initialize_weights().

These weights are immediately overwritten when the state_dict is loaded. This means we can safely bypass initialization without consequence.

Additionally, this can achieved with no additional libraries beyond those in requirements.txt. As such, I set the default as low_cpu_mem_usage=True.

Results

From my tests, this change:

Reduces the initial pipeline load time by 4-5x and
Decreases peak initial RAM usage by 10-15GB

Cold Load

New process + memory freed

low_cpu_mem_usage	avg load time	RAM usage
True	9.53s	18GB
False	41.56s	28GB

Hot Load

pipe.from_pretrained...; del pipe; gc.collect(); pipe.from_pretrained...

low_cpu_mem_usage	avg load time	RAM usage
True	5.07s	18GB
False	36.64s	33GB

This is the first of 3 PRs I'm issuing to improve performance/fix errors. I've tried to keep each incremental change as small in scope as possible. PRs: 1. This, 2. #150, 3. #151

Prevents slow CPU initialization of model weights on load by using accelerate `init_empty_weights`. Completely compatible with from_pretrained since weights will always be overwritten by state_dict fixes VectorSpaceLab#72

Rypo added 2 commits November 25, 2024 19:39

feat: fast model loading with accelerate

387f48c

Prevents slow CPU initialization of model weights on load by using accelerate `init_empty_weights`. Completely compatible with from_pretrained since weights will always be overwritten by state_dict fixes VectorSpaceLab#72

fix: avoid moving model to device prematurely

0287b50

This was referenced Nov 28, 2024

Fix RuntimeError: CUDA error: out of memory on CPU transfer (2/3) #150

Open

Adds support for 4bit (nf4) and 8bit bitsandbytes quantization (3/3) #151

Open

Rypo added 2 commits December 6, 2024 12:00

Merge branch 'main' into fast_load_model

0360484

Merge branch 'main' into fast_load_model

f3d7f01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Reduce initial pipeline load time by 4-5x (1/3) #149

Reduce initial pipeline load time by 4-5x (1/3) #149

Uh oh!

Rypo commented Nov 28, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Reduce initial pipeline load time by 4-5x (1/3) #149

Are you sure you want to change the base?

Reduce initial pipeline load time by 4-5x (1/3) #149

Uh oh!

Conversation

Rypo commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Results

Cold Load

Hot Load

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Rypo commented Nov 28, 2024 •

edited

Loading