Unable to see any speedup from multi-GPU training using JAX backend for LSTM / GRU #20998

larschristensen · 2025-03-06T21:58:36Z

Using the JAX backend for LSTM / GRU models, I'm unable to see any speed-up when training with 2 Nvidia 3090 vs using a single Nvidia 3090 (using keras-nightly and JAX 0.5.2). The distributed training across 2 GPUs seems to work fine, but it is just not faster and maybe even slower. See attached file for a modified version of the Keras timeseries weather forecasting example that showcases the problem.

I also can't seem to find any "official" Keras / Keras-IO example showing distributed training with a measurement of the training time. Shouldn't there be such an "official" example to showcase the gain by multi-device training?

timeseries_weather_forecasting_LC.zip

larschristensen · 2025-03-11T09:00:47Z

Having looked more into this issue, it turns out I'm able to see a speed-up for very large batch sizes, e.g. 65536. However, using such a large batch size is likely not practical for most model trainings.

The lack of speed-up for "normal" batch size seems to be the result of the way lax.scan is implemented in JAX / XLA, see e.g. jax-ml/jax#25336 and links therein for a good overview. It therefore looks like this is really a bottleneck in JAX / XLA and not Keras. However, it is proably good to monitor the development of this in JAX / XLA to see if any improvements made there can directly benefit Keras.

github-actions bot assigned mehtamansi29 Mar 6, 2025

sonali-kumari1 added the backend:jax label Mar 7, 2025

larschristensen changed the title ~~Unable to see any speedup from multi-GPU training using JAX backend~~ Unable to see any speedup from multi-GPU training using JAX backend for LSTM / GRU Mar 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to see any speedup from multi-GPU training using JAX backend for LSTM / GRU #20998

Unable to see any speedup from multi-GPU training using JAX backend for LSTM / GRU #20998

larschristensen commented Mar 6, 2025 •

edited

Loading

larschristensen commented Mar 11, 2025

Unable to see any speedup from multi-GPU training using JAX backend for LSTM / GRU #20998

Unable to see any speedup from multi-GPU training using JAX backend for LSTM / GRU #20998

Comments

larschristensen commented Mar 6, 2025 • edited Loading

larschristensen commented Mar 11, 2025

larschristensen commented Mar 6, 2025 •

edited

Loading