ddp_spawn causing "Cannot allocate memory" #20796
Unanswered
jonathanrenusch
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Multi-GPU Training on In-Memory Graphs
Problem: I'm aiming to perform multi-GPU training on an in-memory graph dataset (approximately 100 GB). The standard Distributed Data Parallel (DDP) implementation in PyTorch appears to create a full copy of the dataset for each distributed process. This means that with n GPUs, the entire 100 GB dataset is loaded into CPU RAM n times, followed by the
DistributedSampler
partitioning the data for each GPU. This leads to significant and unnecessary RAM consumption on the CPU.Desired Solution: My goal is to load the 100 GB dataset into CPU RAM only once and then utilize the
DistributedSampler
(or another suitable shuffled sampler) to provide distinct partitions of this single dataset instance to each of the n distributed processes running on the GPUs. This approach should avoid redundant data loading and substantially reduce CPU RAM usage.Attempts and Challenges:
The Lightning documentation on sharing datasets across process boundaries seems to address this exact scenario. However, I've encountered persistent errors when attempting to implement the suggested solutions.
Specifically:
ddp
strategy.ddp_spawn
strategy results in the attached error message (see below).num_workers
for theDataLoader
to 1.batch_size
of 1 to minimize per-process memory.Dataset Structure:
My dataset is a list of dictionaries stored in native PyTorch
.pt
format. Each dictionary contains the following keys, with corresponding PyTorch tensors as values:node_features
: Tensor of node features.edge_indices
: Tensor representing the graph's edge connections.labels
: Tensor of target labels.Constraints:
Due to downstream deployment requirements, using PyTorch Geometric
Data
objects is not a viable option. Therefore, I require a solution that works with a customDataset
and acustom_fn
.Error Message (when using
ddp_spawn
):Next Steps (if current approach fails):
If a solution for shared in-memory datasets with
ddp_spawn
cannot be found, I will be forced to explore alternative strategies such as:Call for Help:
Any insights, suggestions, or code examples demonstrating how to correctly implement shared in-memory datasets with
ddp_spawn
(or an alternative multi-GPU strategy that avoids redundant loading) would be greatly appreciated! I'm particularly interested in understanding how to properly configure theDataset
,DataLoader
, and potentially a customcollate_fn
orworker_init_fn
within the Lightning framework to achieve this.Custom_fn:
Lightning Data Module:
Beta Was this translation helpful? Give feedback.
All reactions