Performance of PyTorch in EB compared to upstream wheels

I've run multiple inference and training benchmarks from https://github.com/pytorch/benchmark to compare the performance of PyTorch build from source in EB with a simple installation using the official wheels from pytorch.org.

## Test configurations

The following tables show the results for PyTorch 2.6.0 in 2024a. I made 3 different installations:

1. Standard build in EB from source:
    * [PyTorch-2.6.0-foss-2024a.eb](https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/p/PyTorch/PyTorch-2.6.0-foss-2024a.eb)
    * [PyTorch-2.6.0-foss-2024a-CUDA-12.6.0.eb](https://github.com/vscentrum/vsc-software-stack/blob/site-vub/easyconfigs/p/PyTorch/PyTorch-2.6.0-foss-2024a-CUDA-12.6.0.eb)
 
2. Official wheels from pytorch.org on top of Python/SciPy-bundle (2024a) and CUDA 12.6.0 on EB:
    * Similar to the following PR, but for PyTorch 2.6.0 and CUDA 12.6.0
      https://github.com/easybuilders/easybuild-easyconfigs/pull/25267

3. Official wheels for PyTorch and CUDA on top of a minimal Python/SciPy-bundle (2024a) environment on EB

## Hardware/Job

All benchmarks run on the same system:
- 16 AMD Zen5 CPU cores
- 1 NVIDIA L40S GPU (48GB VRAM)
- 96GB of RAM
- each benchmark ran at least 5 times

## Results

All values are relative *Operations per Second* (OPS) with respect the configuration (3) with PyTorch and CUDA wheels.

### Training on GPU

  | (1) Build source | (2) PyTorch wheels | (3) PyTorch/CUDA wheels
-- | -- | -- | --
*--TOTAL--* | 100.37% | 99.88% | 100.00%
BERT_pytorch | 100.02% | 99.61% | 100.00%
alexnet | 99.63% | 99.92% | 100.00%
detectron2_fasterrcnn_r_101_c4 | x | 97.90% | 100.00%
detectron2_fasterrcnn_r_101_dc5 | x | 99.80% | 100.00%
detectron2_fasterrcnn_r_101_fpn | x | 102.76% | 100.00%
LearningToPaint | 101.59% | 101.26% | 100.00%
dlrm | 99.88% | 99.91% | 100.00%
hf_Bert_large | 101.30% | 98.47% | 100.00%
mobilenet_v3_large | 99.98% | 99.45% | 100.00%
timm_regnet | 100.39% | 99.98% | 100.00%
yolov3 | 100.14% | 99.61% | 100.00%

Note: the runs with an `x` failed with errors.

### Inference on GPU

  | (1) Build source | (2) PyTorch wheels | (3) PyTorch/CUDA wheels
-- | -- | -- | --
*--TOTAL--* | 99.80% | 99.56% | 100.00%
BERT_pytorch | 100.29% | 97.69% | 100.00%
alexnet | 100.41% | 98.62% | 100.00%
detectron2_fasterrcnn_r_101_c4 | x | 99.80% | 100.00%
detectron2_fasterrcnn_r_101_dc5 | x | 101.15% | 100.00%
detectron2_fasterrcnn_r_101_fpn | x | 98.73% | 100.00%
LearningToPaint | 98.01% | 100.25% | 100.00%
dlrm | 99.67% | 99.83% | 100.00%
hf_Bert_large | 99.14% | 101.19% | 100.00%
hf_T5_large | 98.42% | 96.71% | 100.00%
mobilenet_v3_large | 100.55% | 100.39% | 100.00%
timm_regnet | 100.53% | 100.30% | 100.00%
yolov3 | 101.19% | 100.01% | 100.00%

Note: the runs with an `x` failed with errors.

### Training on CPU

  | (1) Build source | (2) PyTorch wheels | (3) PyTorch/CUDA wheels
-- | -- | -- | --
*--TOTAL--* | 31.94% | 99.91% | 100.00%
BERT_pytorch | 7.16% | 101.44% | 100.00%
alexnet | 28.28% | 101.51% | 100.00%
detectron2_fasterrcnn_r_101_c4 | 71.91% | 100.92% | 100.00%
LearningToPaint | 29.47% | 101.39% | 100.00%
dlrm | 22.87% | 100.61% | 100.00%
hf_Bert_large | 7.29% | 101.23% | 100.00%
mobilenet_v3_large | 9.93% | 93.36% | 100.00%
timm_vision_transformer_large | 77.08% | 100.63% | 100.00%
timm_regnet | 33.44% | 98.12% | 100.00%

### Inference on CPU

  | (1) Build source | (2) PyTorch wheels | (3) PyTorch/CUDA wheels
-- | -- | -- | --
*--TOTAL--* | 35.54% | 99.63% | 100.00%
BERT_pytorch | 2.08% | 100.66% | 100.00%
alexnet | 28.77% | 98.30% | 100.00%
detectron2_fasterrcnn_r_101_c4 | 68.24% | 99.39% | 100.00%
LearningToPaint | 32.40% | 100.97% | 100.00%
dlrm | 8.27% | 96.77% | 100.00%
hf_Bert_large | 25.18% | 94.50% | 100.00%
hf_T5_large | 5.22% | 101.43% | 100.00%
mobilenet_v3_large | 21.25% | 113.94% | 100.00%
timm_vision_transformer_large | 67.20% | 99.87% | 100.00%
timm_regnet | 61.48% | 98.80% | 100.00%
yolov3 | 70.84% | 91.34% | 100.00%

## Observations

* Performance of inference/training on GPU is practically the same between builds from source and the official wheels
* Performance of inference/training on CPU is unreasonably low on the builds from source on EB compared to official wheels

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of PyTorch in EB compared to upstream wheels #931

Test configurations

Hardware/Job

Results

Training on GPU

Inference on GPU

Training on CPU

Inference on CPU

Observations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	(1) Build source	(2) PyTorch wheels	(3) PyTorch/CUDA wheels
--TOTAL--	100.37%	99.88%	100.00%
BERT_pytorch	100.02%	99.61%	100.00%
alexnet	99.63%	99.92%	100.00%
detectron2_fasterrcnn_r_101_c4	x	97.90%	100.00%
detectron2_fasterrcnn_r_101_dc5	x	99.80%	100.00%
detectron2_fasterrcnn_r_101_fpn	x	102.76%	100.00%
LearningToPaint	101.59%	101.26%	100.00%
dlrm	99.88%	99.91%	100.00%
hf_Bert_large	101.30%	98.47%	100.00%
mobilenet_v3_large	99.98%	99.45%	100.00%
timm_regnet	100.39%	99.98%	100.00%
yolov3	100.14%	99.61%	100.00%

	(1) Build source	(2) PyTorch wheels	(3) PyTorch/CUDA wheels
--TOTAL--	99.80%	99.56%	100.00%
BERT_pytorch	100.29%	97.69%	100.00%
alexnet	100.41%	98.62%	100.00%
detectron2_fasterrcnn_r_101_c4	x	99.80%	100.00%
detectron2_fasterrcnn_r_101_dc5	x	101.15%	100.00%
detectron2_fasterrcnn_r_101_fpn	x	98.73%	100.00%
LearningToPaint	98.01%	100.25%	100.00%
dlrm	99.67%	99.83%	100.00%
hf_Bert_large	99.14%	101.19%	100.00%
hf_T5_large	98.42%	96.71%	100.00%
mobilenet_v3_large	100.55%	100.39%	100.00%
timm_regnet	100.53%	100.30%	100.00%
yolov3	101.19%	100.01%	100.00%

	(1) Build source	(2) PyTorch wheels	(3) PyTorch/CUDA wheels
--TOTAL--	31.94%	99.91%	100.00%
BERT_pytorch	7.16%	101.44%	100.00%
alexnet	28.28%	101.51%	100.00%
detectron2_fasterrcnn_r_101_c4	71.91%	100.92%	100.00%
LearningToPaint	29.47%	101.39%	100.00%
dlrm	22.87%	100.61%	100.00%
hf_Bert_large	7.29%	101.23%	100.00%
mobilenet_v3_large	9.93%	93.36%	100.00%
timm_vision_transformer_large	77.08%	100.63%	100.00%
timm_regnet	33.44%	98.12%	100.00%

	(1) Build source	(2) PyTorch wheels	(3) PyTorch/CUDA wheels
--TOTAL--	35.54%	99.63%	100.00%
BERT_pytorch	2.08%	100.66%	100.00%
alexnet	28.77%	98.30%	100.00%
detectron2_fasterrcnn_r_101_c4	68.24%	99.39%	100.00%
LearningToPaint	32.40%	100.97%	100.00%
dlrm	8.27%	96.77%	100.00%
hf_Bert_large	25.18%	94.50%	100.00%
hf_T5_large	5.22%	101.43%	100.00%
mobilenet_v3_large	21.25%	113.94%	100.00%
timm_vision_transformer_large	67.20%	99.87%	100.00%
timm_regnet	61.48%	98.80%	100.00%
yolov3	70.84%	91.34%	100.00%

Performance of PyTorch in EB compared to upstream wheels #931

Description

Test configurations

Hardware/Job

Results

Training on GPU

Inference on GPU

Training on CPU

Inference on CPU

Observations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions