You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/design_doc/isa_dyndisp.md
+37-33
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,10 @@
1
-
# IPEX CPU ISA Dynamic Dispatch Design Doc
1
+
# Intel® Extension for PyTorch\* CPU ISA Dynamic Dispatch Design Doc
2
2
3
-
This document explains the dynamic kernel dispatch mechanism based on CPU ISA. It is an extension to the similar mechanism in PyTorch.
3
+
This document explains the dynamic kernel dispatch mechanism for Intel® Extension for PyTorch\* (IPEX) based on CPU ISA. It is an extension to the similar mechanism in PyTorch.
4
4
5
5
## Overview
6
-
---
7
-
IPEX dyndisp is forked from **PyTorch:**`ATen/native/DispatchStub.h` and `ATen/native/DispatchStub.cpp`. Besides that, IPEX add more CPU ISA level support, such as `AVX512_VNNI`, `AVX512_BF16` and `AMX`.
6
+
7
+
IPEX dyndisp is forked from **PyTorch:**`ATen/native/DispatchStub.h` and `ATen/native/DispatchStub.cpp`. IPEX adds additional CPU ISA level support, such as `AVX512_VNNI`, `AVX512_BF16` and `AMX`.
@@ -23,19 +23,19 @@ PyTorch & IPEX CPU ISA support statement:
23
23
| AVX512_BF16 | GCC 10.3+ |
24
24
| AMX | GCC 11.2+ |
25
25
26
-
\*Detailed compiler check, please check with `cmake/Modules/FindAVX.cmake`
26
+
\*Check with `cmake/Modules/FindAVX.cmake`for detailed compiler checks.
27
27
28
28
## Dynamic Dispatch Design
29
-
---
30
-
Dynamic dispatch major mechanism is to copy the kernel implementation source file to multiple folders for each ISA level. And then build each file using its ISA specific parameters. Each generated object file will contains its function body(**Kernel Implementation**).
31
29
32
-
Kernel Implementation use anonymous namespace so that different cpu versions won't conflict.
30
+
Dynamic dispatch copies the kernel implementation source files to multiple folders for each ISA level. It then builds each file using its ISA specific parameters. Each generated object file will contain its function body (**Kernel Implementation**).
33
31
34
-
**Kernel Stub** is a "virtual function" with polymorphic kernel implementations w.r.t. ISA levels.
32
+
Kernel Implementation uses an anonymous namespace so that different CPU versions won't conflict.
35
33
36
-
At the runtime, **Dispatch Stub implementation**will check CPUIDs and OS status to determins which ISA level pointer to best matching function body.
34
+
**Kernel Stub**is a "virtual function" with polymorphic kernel implementations pertaining to ISA levels.
37
35
38
-
### Code Folder Struct
36
+
At the runtime, **Dispatch Stub implementation** will check CPUIDs and OS status to determins which ISA level pointer best matches the function body.
>1. DEFAULT level kernels is not fully implemented in IPEX. In order to align to PyTorch, we build default use AVX2 parameters in stead of that. So, IPEX minimal required executing machine support AVX2.
70
74
>2.`-D__AVX__` and `-D__AVX512F__` is defined for depends library [sleef](https://sleef.org/) .
@@ -73,12 +77,12 @@ The CodeGen will copy each cpp files from **Kernel implementation**, and then ad
73
77
>5. Higher ISA level is compatible to lower ISA levels, so it needs to contains level ISA feature definitions. Such as AVX512_BF16 need contains `-DCPU_CAPABILITY_AVX512``-DCPU_CAPABILITY_AVX512_VNNI`. But AVX512 don't contains AVX2 definitions, due to there are different vec register width.
74
78
75
79
## Add Custom Kernel
76
-
---
77
-
If you want to add new custom kernel, and the kernel using CPU ISA instruction. Please reference to below steps.
78
80
79
-
1. Please add CPU ISA related kernel implementation to the folder: `intel_extension_for_pytorch/csrc/aten/cpu/kernels/NewKernelKrnl.cpp`
80
-
2. Please add kernel stub to the folder: `intel_extension_for_pytorch/csrc/aten/cpu/NewKernel.cpp`
81
-
3. Please include header file: `intel_extension_for_pytorch/csrc/dyndisp/DispatchStub.h`, and reference to the comment in the header file.
81
+
If you want to add a new custom kernel, and the kernel uses CPU ISA instructions, refer to these tips:
82
+
83
+
1. Add CPU ISA related kernel implementation to the folder: `intel_extension_for_pytorch/csrc/aten/cpu/kernels/NewKernelKrnl.cpp`
84
+
2. Add kernel stub to the folder: `intel_extension_for_pytorch/csrc/aten/cpu/NewKernel.cpp`
85
+
3. Include header file: `intel_extension_for_pytorch/csrc/dyndisp/DispatchStub.h`, and reference to the comment in the header file.
82
86
```c++
83
87
// Implements instruction set specific function dispatch.
84
88
//
@@ -111,9 +115,9 @@ If you want to add new custom kernel, and the kernel using CPU ISA instruction.
111
115
112
116
>**Note:**
113
117
>
114
-
>1. Some kernel only call **oneDNN** or **iDeep** implementation, or other backend implementation. Which is not need to add kernel implementation. (Refer: `BatchNorm.cpp`)
115
-
>2. Vec related header file must be included in kernel implementation file, but can not be included in kernel stub. Kernel stub is common code for all ISA level, and can't pass ISA related compiler parameters.
>1. Some kernels only call **oneDNN** or **iDeep** implementation, or other backend implementation, which is not needed to add kernel implementations. (Refer: `BatchNorm.cpp`)
119
+
>2. Vec related header file must be included in kernel implementation files, but can not be included in kernel stub. Kernel stub is common code for all ISA level, and can't pass ISA related compiler parameters.
120
+
>3.For more intrinsics, check the[Intel® Intrinsics Guide](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html).
Macro `CPU_CAPABILITY_AVX512` and `CPU_CAPABILITY_AVX512_BF16` are defined by compiler check, it is means that current compiler havs capability to generate defined ISA level code.
165
169
166
-
Because of `AVX512_BF16` is higher level than `AVX512`, and it compatible to `AVX512`. `CPU_CAPABILITY_AVX512_BF16` can be contained in `CPU_CAPABILITY_AVX512` region.
170
+
Because of `AVX512_BF16` is higher level than `AVX512`, and it compatible to `AVX512`. `CPU_CAPABILITY_AVX512_BF16` can be contained in `CPU_CAPABILITY_AVX512` region.
This example show get data type size and Its Vec size. In different ISA, Vec has different register width, and it has different Vec size also.
254
+
This example shows how to get the data type size and its Vec size. In different ISA, Vec has a different register width and a different Vec size.
251
255
252
256
```c++
253
257
//csrc/aten/cpu/GetVecLength.h
@@ -354,19 +358,19 @@ REGISTER_DISPATCH(
354
358
355
359
```
356
360
## Private Debug APIs
357
-
---
358
-
Here three ISArelated private APIs could do same debug work. Which contains:
361
+
362
+
Here are three ISA-related private APIs that can help debugging::
359
363
1. Query current ISA level.
360
364
2. Query max CPU supported ISA level.
361
365
3. Query max binary supported ISA level.
362
366
>**Note:**
363
367
>
364
368
>1. Max CPU supported ISA level only depends on CPU features.
365
369
>2. Max binary supported ISA level only depends on built complier version.
366
-
>3. Current ISA level, it is equal minimal of `max CPU ISA level` and `max binary ISA level`.
370
+
>3. Current ISA level, it is the smaller of `max CPU ISA level` and `max binary ISA level`.
367
371
368
372
### Example:
369
-
```cmd
373
+
```bash
370
374
python
371
375
Python 3.9.7 (default, Sep 162021, 13:09:58)
372
376
[GCC 7.5.0] :: Anaconda, Inc. on linux
@@ -382,24 +386,24 @@ Type "help", "copyright", "credits" or "license" for more information.
382
386
```
383
387
384
388
## Select ISA level manually.
385
-
---
386
-
By default, IPEX dispatches to the kernels with maximum ISA level supported by the underlying CPU hardware. This ISA level can be overridden by the environment variable `ATEN_CPU_CAPABILITY` (same environment variable from PyTorch). The available values are {`avx2`, `avx512`, `avx512_vnni`, `avx512_bf16`, `amx`}. The effective ISA level would be the minimal level between `ATEN_CPU_CAPABILITY` and the maximum level supported by the hardware.
389
+
390
+
By default, IPEX dispatches to the kernels with the maximum ISA level supported by the underlying CPU hardware. This ISA level can be overridden by the environment variable `ATEN_CPU_CAPABILITY` (same environment variable as PyTorch). The available values are {`avx2`, `avx512`, `avx512_vnni`, `avx512_bf16`, `amx`}. The effective ISA level would be the minimal level between `ATEN_CPU_CAPABILITY` and the maximum level supported by the hardware.
387
391
### Example:
388
-
```cmd
392
+
```bash
389
393
$ python -c 'import intel_extension_for_pytorch._C as core;print(core._get_current_isa_level())'
390
394
AMX
391
395
$ ATEN_CPU_CAPABILITY=avx2 python -c 'import intel_extension_for_pytorch._C as core;print(core._get_current_isa_level())'
392
396
AVX2
393
397
```
394
398
>**Note:**
395
399
>
396
-
>`core._get_current_isa_level()` is an IPEX internal function used for checking the current effective ISA level. It is used for debugging purpose only and subjects to change.
400
+
>`core._get_current_isa_level()` is an IPEX internal function used for checking the current effective ISA level. It is used for debugging purpose only and subject to change.
397
401
398
402
## CPU feature check
399
-
---
403
+
400
404
An addtional CPU feature check tool in the subfolder: `tests/cpu/isa`
Intel® Extension for PyTorch* extends PyTorch with optimizations for extra performance boost on Intel hardware. Most of the optimizations will be included in stock PyTorch releases eventually, and the intention of the extension is to deliver up-to-date features and optimizations for PyTorch on Intel hardware, examples include AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX).
9
+
Intel® Extension for PyTorch* extends PyTorch with up-to-date features optimizations for an extra performance boost on Intel hardware. Example optimizations use AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX). Over time, most of these optimizations will be included directly into stock PyTorch releases.
10
10
11
-
Intel® Extension for PyTorch* is structured as the following figure. It is loaded as a Python module for Python programs or linked as a C++ library for C++ programs. Users can enable it dynamically in script by importing `intel_extension_for_pytorch`. It covers optimizations for both imperative mode and graph mode. Optimized operators and kernels are registered through PyTorch dispatching mechanism. These operators and kernels are accelerated from native vectorization feature and matrix calculation feature of Intel hardware. During execution, Intel® Extension for PyTorch* intercepts invocation of ATen operators, and replace the original ones with these optimized ones. In graph mode, further operator fusions are applied manually by Intel engineers or through a tool named *oneDNN Graph* to reduce operator/kernel invocation overheads, and thus increase performance.
11
+
Intel® Extension for PyTorch* provides optimizations for both eager mode and graph mode, however, compared to eager mode, graph mode in PyTorch normally yields better performance from optimization techniques such as operation fusion, and Intel® Extension for PyTorch* amplified them with more comprehensive graph optimizations. Therefore we recommended you to take advantage of Intel® Extension for PyTorch* with `TorchScript <https://pytorch.org/docs/stable/jit.html>`_ whenever your workload supports it. You could choose to run with `torch.jit.trace()` function or `torch.jit.script()` function, but based on our evaluation, `torch.jit.trace()` supports more workloads so we recommend you to use `torch.jit.trace()` as your first choice. More detailed information can be found at `pytorch.org website <https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html#tracing-modules>`_.
The extension can be loaded as a Python module for Python programs or linked as a C++ library for C++ programs. In Python scripts users can enable it dynamically by importing `intel_extension_for_pytorch`.
14
+
15
+
Intel® Extension for PyTorch* is structured as shown in the following figure:
PyTorch components are depicted with white boxes while Intel Extensions are with blue boxes. Extra performance of the extension is delivered via both custom addons and overriding existing PyTorch components. In eager mode, the PyTorch frontend is extended with custom Python modules (such as fusion modules), optimal optimizers and INT8 quantization API. Further performance boosting is available by converting the eager-mode model into graph mode via the extended graph fusion passes. Intel® Extension for PyTorch* dispatches the operators into their underlying kernels automatically based on ISA that it detects and leverages vectorization and matrix acceleration units available in Intel hardware, as much as possible. oneDNN library is used for computation intensive operations. Intel Extension for PyTorch runtime extension brings better efficiency with finer-grained thread runtime control and weight sharing.
19
24
20
25
Intel® Extension for PyTorch* has been released as an open–source project at `Github <https://github.com/intel/intel-extension-for-pytorch>`_.
Copy file name to clipboardExpand all lines: docs/tutorials/blogs_publications.md
+1
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,7 @@
1
1
Blogs & Publications
2
2
====================
3
3
4
+
*[Accelerating PyTorch with Intel® Extension for PyTorch\*](https://medium.com/pytorch/accelerating-pytorch-with-intel-extension-for-pytorch-3aef51ea3722)
4
5
*[Intel and Facebook Accelerate PyTorch Performance with 3rd Gen Intel® Xeon® Processors and Intel® Deep Learning Boost’s new BFloat16 capability](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/intel-facebook-boost-bfloat16.html)
5
6
*[Accelerate PyTorch with the extension and oneDNN using Intel BF16 Technology](https://medium.com/pytorch/accelerate-pytorch-with-ipex-and-onednn-using-intel-bf16-technology-dca5b8e6b58f)
0 commit comments