-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Labels
featureIs an improvement or enhancementIs an improvement or enhancementstrategy: ddpDistributedDataParallelDistributedDataParallel
Description
Description & Motivation
Based on the docs, BatchSizeFinder does not currently support DDP:
Batch size finder is not yet supported for DDP or any of its variations, it is coming soon.
https://lightning.ai/docs/pytorch/stable/advanced/training_tricks.html
It would be great to have a way to automatically find the correct batch size when using submitting DDP jobs to a queue that has multiple instance types. We would like to automatically detect the correct batch size to use when doing single instance, multi-gpu training.
Pitch
Support BatchSizeFinder when using DDP strategy.
Alternatives
Alternatively, provide some way to run BatchSizeFinder on a single GPU prior to DDP being used.
Additional context
Related Issues and PRs:
Metadata
Metadata
Assignees
Labels
featureIs an improvement or enhancementIs an improvement or enhancementstrategy: ddpDistributedDataParallelDistributedDataParallel