optax.MultiSteps out of memory #472

ein-ich · 2023-01-07T12:06:43Z

I always get an out of memory error using optax.MultiSteps, even when every_k_schedule=1.
Using optax.apply_every(k=1) in a chain works fine.

optimizer = optax.chain(
    optax.clip_by_global_norm(0.5),
    optax.adam(lr),
    #optax.apply_every(k=1)
)
optimizer = optax.MultiSteps(optimizer, every_k_schedule=1)

Later I'm using
opt_state = optimizer.init(params)
and

updates, opt_state = optimizer.update(grads, opt_state, params)
params = optax.apply_updates(params, updates)

I have no idea what I could be doing wrong. I'm not changing anything else, like batch size.

The text was updated successfully, but these errors were encountered:

mkunesch · 2023-01-08T18:28:31Z

Hi! Interesting - thanks for reporting this!

Are you also at more than ~2/3 memory usage when you use apply_every? From a first look, I could see that the implementation of apply_every returns 0*updates for skipped steps while MultiSteps constructs a new array of 0s (even if every_k_schedule=1) so the former has a better memory footprint. This would explain a higher memory usage by up to 50% - but not more.

I'm not sure why the two functions use completely different code paths - we should be able to merge them (and deprecate one of them).

ein-ich · 2023-01-08T19:45:32Z

I have most of my available memory preallocated by JAX. I tried reducing the batch size from 120 (which works with apply_every) to 30, but it still crashed with MultiSteps.

ayaka14732 · 2023-07-22T11:48:11Z

I am training Llama 2 7B on TPU. Without optax.MultiSteps my batch_size can be 4. However, after applying optax.MultiSteps, I got OOM even if batch_size is 1.

hr0nix · 2023-08-21T20:57:08Z

I can confirm that MultiStep implementation has much larger memory overhead than just one extra buffer for gradient (something like 4x extra buffers). This is very problematic when using this class with large models.

Sea-Snell · 2023-08-28T23:19:53Z

I also noticed this issue

philippe-eecs · 2023-08-28T23:28:15Z

I am having this issue as well for use in diffusion models

agrimgupta92 · 2023-08-29T20:18:49Z

Facing the same issue.

Change the implementation to allow JAX/XLA to re-use memory buffers. #472 PiperOrigin-RevId: 561129449

Change the implementation to allow JAX/XLA to re-use memory buffers. #472 PiperOrigin-RevId: 561390202

hbq1 · 2023-08-30T18:01:25Z

Hi everyone, thanks for flagging it up. I just merged a new version of optax.MultiSteps which should be more memory friendly, could you check this please?

philippe-eecs · 2023-08-30T19:28:30Z

you're a king

celiolarcher · 2023-10-06T00:18:08Z

Hi @hbq1! Thank you for the fix!

One question, I am still seeing a larger consumption with MultiStep when compared with the function apply_every. This was supposed to happen?

celiolarcher · 2023-10-06T15:07:02Z

As a follow-up, I was conducting some debugging by myself and it seems that the problem is on this part of the code (line 414):

new_updates, new_state = jax.lax.cond(
          state.mini_step < k_steps - 1,
          _mid_step, _final_step, *(state, params, acc_grads))

If I got it right, JAX is allocating memory for both function outputs (_mid_step and _final_step), so this basically doubles the space to store optimizer states and grads.

Still trying to figure out a way to solve it, though.

celiolarcher · 2023-10-10T13:54:32Z

Just added a PR merging apply_every logic into MultiStep function. From my initial tests, it reduces the memory footprint (able to train Llama2 7b in a v3-8 now) without affecting convergence.

mtthss · 2023-10-31T11:16:10Z

This is really great!

hbq1 · 2023-11-23T09:51:33Z

Awesome work @celiolarcher!

jax.lax.cond seems to be suboptimal in some use cases, e.g. here, in theory, it should understand that either _mid_step or _final_step needs to be executed, so it shouldn't allocate memory for both outputs. It might be something that JAX/XLA devs would like to have a look at. Let me know if you'd like me to file a bug to https://github.com/google/jax/issues, or feel free to do it yourself ofc!

celiolarcher · 2023-12-01T11:26:49Z

I'm glad to be able to help!
About the issue @hbq1 , I can open it there, no problem.

copybara-service bot pushed a commit that referenced this issue Aug 29, 2023

Optimise memory usage in MultiSteps.

5acbcbf

Change the implementation to allow JAX/XLA to re-use memory buffers. #472 PiperOrigin-RevId: 561129449

copybara-service bot mentioned this issue Aug 29, 2023

Optimise memory usage in MultiSteps. #581

Closed

copybara-service bot pushed a commit that referenced this issue Aug 29, 2023

Optimise memory usage in MultiSteps.

82792e3

Change the implementation to allow JAX/XLA to re-use memory buffers. #472 PiperOrigin-RevId: 561129449

hbq1 self-assigned this Aug 29, 2023

copybara-service bot pushed a commit that referenced this issue Aug 29, 2023

Optimise memory usage in MultiSteps.

d2a8cfa

Change the implementation to allow JAX/XLA to re-use memory buffers. #472 PiperOrigin-RevId: 561129449

copybara-service bot pushed a commit that referenced this issue Aug 29, 2023

Optimise memory usage in MultiSteps.

9864205

Change the implementation to allow JAX/XLA to re-use memory buffers. #472 PiperOrigin-RevId: 561129449

copybara-service bot pushed a commit that referenced this issue Aug 29, 2023

Optimise memory usage in MultiSteps.

2a1748f

Change the implementation to allow JAX/XLA to re-use memory buffers. #472 PiperOrigin-RevId: 561129449

copybara-service bot pushed a commit that referenced this issue Aug 30, 2023

Optimise memory usage in MultiSteps.

5d2dd0f

Change the implementation to allow JAX/XLA to re-use memory buffers. #472 PiperOrigin-RevId: 561129449

copybara-service bot pushed a commit that referenced this issue Aug 30, 2023

Optimise memory usage in MultiSteps.

028c690

Change the implementation to allow JAX/XLA to re-use memory buffers. #472 PiperOrigin-RevId: 561129449

copybara-service bot pushed a commit that referenced this issue Aug 30, 2023

Optimise memory usage in MultiSteps.

e7f916c

Change the implementation to allow JAX/XLA to re-use memory buffers. #472 PiperOrigin-RevId: 561129449

copybara-service bot pushed a commit that referenced this issue Aug 30, 2023

Optimise memory usage in MultiSteps.

f296cfa

Change the implementation to allow JAX/XLA to re-use memory buffers. #472 PiperOrigin-RevId: 561390202

ayaka14732 mentioned this issue Oct 4, 2023

train.py OOM on TPUv3-8 ayaka14732/llama-2-jax#10

Open

celiolarcher mentioned this issue Oct 10, 2023

Add: merge multistep and apply_every logic #596

Merged

celiolarcher mentioned this issue Dec 1, 2023

Large consume of memory when using jax.lax.cond jax-ml/jax#18769

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optax.MultiSteps out of memory #472

optax.MultiSteps out of memory #472

ein-ich commented Jan 7, 2023 •

edited

Loading

mkunesch commented Jan 8, 2023 •

edited

Loading

ein-ich commented Jan 8, 2023

ayaka14732 commented Jul 22, 2023

hr0nix commented Aug 21, 2023

Sea-Snell commented Aug 28, 2023

philippe-eecs commented Aug 28, 2023

agrimgupta92 commented Aug 29, 2023

hbq1 commented Aug 30, 2023

philippe-eecs commented Aug 30, 2023

celiolarcher commented Oct 6, 2023

celiolarcher commented Oct 6, 2023

celiolarcher commented Oct 10, 2023

mtthss commented Oct 31, 2023

hbq1 commented Nov 23, 2023 •

edited

Loading

celiolarcher commented Dec 1, 2023

optax.MultiSteps out of memory #472

optax.MultiSteps out of memory #472

Comments

ein-ich commented Jan 7, 2023 • edited Loading

mkunesch commented Jan 8, 2023 • edited Loading

ein-ich commented Jan 8, 2023

ayaka14732 commented Jul 22, 2023

hr0nix commented Aug 21, 2023

Sea-Snell commented Aug 28, 2023

philippe-eecs commented Aug 28, 2023

agrimgupta92 commented Aug 29, 2023

hbq1 commented Aug 30, 2023

philippe-eecs commented Aug 30, 2023

celiolarcher commented Oct 6, 2023

celiolarcher commented Oct 6, 2023

celiolarcher commented Oct 10, 2023

mtthss commented Oct 31, 2023

hbq1 commented Nov 23, 2023 • edited Loading

celiolarcher commented Dec 1, 2023

ein-ich commented Jan 7, 2023 •

edited

Loading

mkunesch commented Jan 8, 2023 •

edited

Loading

hbq1 commented Nov 23, 2023 •

edited

Loading