Skip to content

Commit 069bf57

Browse files
tye1Chao1Han
andauthored
[doc]update optimize usage (#1714)
update torch.xpu.optimize to ipex.optimize * add comments to torch.xpu.optimize in xpu/utils.py * Add torch.xpu.optimize in api_doc.rst * Add torch.xpu.optimize example Co-authored-by: chaohan <[email protected]>
1 parent 3f1ac30 commit 069bf57

File tree

8 files changed

+173
-101
lines changed

8 files changed

+173
-101
lines changed

csrc/include/xpu/Stream.h

+4-4
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,10 @@
2121

2222
namespace xpu {
2323

24-
/// Get a sycl queue from a c10 stream. Generate a dpcpp stream from c10 stream,
25-
/// and get dpcpp queue.
24+
/// Get a sycl queue from a c10 stream. Generate a sycl stream from c10 stream,
25+
/// and get sycl queue.
2626
/// @param stream: c10 stream.
27-
/// @returns: dpcpp queue.
27+
/// @returns: sycl queue.
2828
IPEX_API sycl::queue& get_queue_from_stream(c10::Stream stream);
2929

30-
} // namespace xpu
30+
} // namespace xpu

docs/index.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Intel® Extension for PyTorch* is structured as shown in the following figure:
1313
:align: center
1414
:alt: Architecture of Intel® Extension for PyTorch*
1515

16-
PyTorch components are depicted with white boxes and Intel extensions are with blue boxes. Extra performance of the extension comes from optimizations for both eager mode and graph mode. In eager mode, the PyTorch frontend is extended with custom Python modules (such as fusion modules), optimal optimizers, and INT8 quantization API. Further performance boosting is available by converting the eager-mode model into graph mode via extended graph fusion passes. For the XPU backend, optimized operators and kernels are implemented and registered through PyTorch dispatching mechanism. These operators and kernels are accelerated from native vectorization feature and matrix calculation feature of Intel GPU hardware. In graph mode, further operator fusions are supported to reduce operator/kernel invocation overheads, and thus increase performance.
16+
PyTorch components are depicted with white boxes and Intel extensions are with blue boxes. Extra performance of the extension comes from optimizations for both eager mode and graph mode. In eager mode, the PyTorch frontend is extended with custom Python modules (such as fusion modules), optimal optimizers, and INT8 quantization API. Further performance boosting is available by converting the eager-mode model into graph mode via extended graph fusion passes. For the XPU device, optimized operators and kernels are implemented and registered through PyTorch dispatching mechanism. These operators and kernels are accelerated from native vectorization feature and matrix calculation feature of Intel GPU hardware. In graph mode, further operator fusions are supported to reduce operator/kernel invocation overheads, and thus increase performance.
1717

1818
Intel® Extension for PyTorch* utilizes the `DPC++ <https://github.com/intel/llvm#oneapi-dpc-compiler>`_ compiler that supports the latest `SYCL* <https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html>`_ standard and also a number of extensions to the SYCL* standard, which can be found in the `sycl/doc/extensions <https://github.com/intel/llvm/tree/sycl/sycl/doc/extensions>`_ directory. Intel® Extension for PyTorch* also integrates `oneDNN <https://github.com/oneapi-src/oneDNN>`_ and `oneMKL <https://github.com/oneapi-src/oneMKL>`_ libraries and provides kernels based on that. The oneDNN library is used for computation intensive operations. The oneMKL library is used for fundamental mathematical operations.
1919

docs/tutorials/api_doc.rst

+21
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,27 @@ General
66

77
.. currentmodule:: intel_extension_for_pytorch
88
.. autofunction:: optimize
9+
10+
11+
12+
`torch.xpu.optimize` is an alternative of optimize API in Intel® Extension for PyTorch*, to provide identical usage for XPU device only.
13+
The motivation of adding this alias is to unify the coding style in user scripts base on torch.xpu modular.
14+
15+
.. code-block:: python
16+
>>> # bfloat16 inference case.
17+
>>> model = ...
18+
>>> model.load_state_dict(torch.load(PATH))
19+
>>> model.eval()
20+
>>> optimized_model = torch.xpu.optimize(model, dtype=torch.bfloat16)
21+
>>> # running evaluation step.
22+
>>> # bfloat16 training case.
23+
>>> optimizer = ...
24+
>>> model.train()
25+
>>> optimized_model, optimized_optimizer = torch.xpu.optimize(model, dtype=torch.bfloat16, optimizer=optimizer)
26+
>>> # running training step.
27+
28+
29+
930
.. currentmodule:: intel_extension_for_pytorch.xpu
1031
.. StreamContext
1132
.. can_device_access_peer

0 commit comments

Comments
 (0)