Skip to content

Support BatchSizeFinder in DDP #20994

@AlJohri

Description

@AlJohri

Description & Motivation

Based on the docs, BatchSizeFinder does not currently support DDP:

Batch size finder is not yet supported for DDP or any of its variations, it is coming soon.
https://lightning.ai/docs/pytorch/stable/advanced/training_tricks.html

It would be great to have a way to automatically find the correct batch size when using submitting DDP jobs to a queue that has multiple instance types. We would like to automatically detect the correct batch size to use when doing single instance, multi-gpu training.

Pitch

Support BatchSizeFinder when using DDP strategy.

Alternatives

Alternatively, provide some way to run BatchSizeFinder on a single GPU prior to DDP being used.

Additional context

Related Issues and PRs:

cc @lantiga @Borda @justusschock

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureIs an improvement or enhancementstrategy: ddpDistributedDataParallel

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions