Doc and script changes for 2.1.40 (#4628)

tye1 · sramakintel · jingxu10 · web-flow · commit 6c9c6a1442ad · 2024-08-12T10:42:56.000+08:00
* Update README.md and known_issues.md

* correct image tag in README of dockerfile (#4638)

* update miniconda to miniforge

* update compile bundle to stop displaying torch-ccl version

* remove optimizer_fusion_cpu and split sgd

* fix advanced config not correctly displayed issue

---------

Co-authored-by: Srikanth Ramakrishna &lt;srikanth.ramakrishna@intel.com&gt;
Co-authored-by: Jing Xu &lt;jing.xu@intel.com&gt;
Co-authored-by: Zheng, Zhaoqiong &lt;zhaoqiong.zheng@intel.com&gt;
diff --git a/docker/Dockerfile.compile b/docker/Dockerfile.compile
@@ -37,17 +37,17 @@ RUN useradd -m -s /bin/bash ubuntu && \
 USER ubuntu
 WORKDIR /home/ubuntu
 
-RUN curl -fsSL -v -o miniconda.sh -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh  && \
-    bash miniconda.sh -b -p ./miniconda3 && \
-    rm miniconda.sh && \
-    echo "source ~/miniconda3/bin/activate" >> ./.bashrc
+RUN curl -fsSL -v -o miniforge.sh -O https://github.com/conda-forge/miniforge/releases/download/24.1.2-0/Miniforge3-24.1.2-0-Linux-x86_64.sh && \
+    bash miniforge.sh -b -p ./miniforge3 && \
+    rm miniforge.sh && \
+    echo "source ~/miniforge3/bin/activate" >> ./.bashrc
 
 FROM base AS dev
 RUN bash /basekit_driver_install_helper.sh dev
 COPY --chown=ubuntu:ubuntu . ./intel-extension-for-pytorch/
 RUN cp ./intel-extension-for-pytorch/scripts/compile_bundle.sh ./ && \
     sed -i "s/VER_IPEX=.*/VER_IPEX=/" compile_bundle.sh
-RUN . ./miniconda3/bin/activate && \
+RUN . ./miniforge3/bin/activate && \
     conda create -y -n compile_py310 python=3.10 && conda activate compile_py310 && \
     bash compile_bundle.sh /opt/intel/oneapi/compiler/latest /opt/intel/oneapi/mkl/latest /opt/intel/oneapi/ccl/latest /opt/intel/oneapi/mpi/latest pvc,ats-m150,acm-g11 && \
     mkdir wheels && cp pytorch/dist/*.whl vision/dist/*.whl audio/dist/*.whl intel-extension-for-pytorch/dist/*.whl intel-extension-for-pytorch/ecological_libs/deepspeed/dist/*.whl ./wheels
@@ -60,7 +60,7 @@ RUN bash /basekit_driver_install_helper.sh runtime && \
     sudo rm /basekit_driver_install_helper.sh
 COPY --from=dev --chown=ubuntu:ubuntu /home/ubuntu/wheels ./wheels
 COPY --from=dev --chown=ubuntu:ubuntu /home/ubuntu/intel-extension-for-pytorch/tools/get_libstdcpp_lib.sh .
-RUN . ./miniconda3/bin/activate && \
+RUN . ./miniforge3/bin/activate && \
     conda create -y -n py310 python=3.10 && conda activate py310 && \
     conda install -y libpng libjpeg-turbo -c conda-forge && \
     python -m pip install ./wheels/*.whl && \
diff --git a/docker/README.md b/docker/README.md
@@ -34,7 +34,7 @@ export IMAGE_TYPE="xpu"
 To pull docker images use the following command:
 
 ```bash
-docker pull intel/intel-extension-for-pytorch:2.1.40-xpu-pip-base
+docker pull intel/intel-extension-for-pytorch:2.1.40-xpu
 ```
 ### Running container:
 
diff --git a/docs/tutorials/features/advanced_configuration.md b/docs/tutorials/features/advanced_configuration.md
@@ -8,7 +8,6 @@ The default settings for Intel® Extension for PyTorch\* are sufficient for most
 The following build options are supported by Intel® Extension for PyTorch\*. Users who install Intel® Extension for PyTorch\* via source compilation could override the default configuration by explicitly setting a build option ON or OFF, and then build. 
 
 | **Build Option** | **Default<br>Value** | **Description** |
-| ------ | ------ | ------ |
 
 For above build options which can be configured to ON or OFF, users can configure them to 1 or 0 also, while ON equals to 1 and OFF equals to 0.
 
@@ -17,13 +16,10 @@ For above build options which can be configured to ON or OFF, users can configur
 The following launch options are supported in Intel® Extension for PyTorch\*. Users who execute AI models on XPU could override the default configuration by explicitly setting the option value at runtime using environment variables, and then launch the execution.
 
 | **Launch Option<br>CPU, GPU** | **Default<br>Value** | **Description** |
-| ------ | ------ | ------ |
 
 | **Launch Option<br>GPU ONLY** | **Default<br>Value** | **Description** |
-| ------ | ------ | ------ |
 
 | **Launch Option<br>Experimental** | **Default<br>Value** | **Description** |
-| ------ | ------ | ------ |
 
 | **Distributed Option<br>GPU ONLY** | **Default<br>Value** | **Description** |
 | ------ | ------ | ------ |
diff --git a/docs/tutorials/known_issues.md b/docs/tutorials/known_issues.md
@@ -39,9 +39,10 @@ Troubleshooting
 - **Problem**: Random bad termination after AI model convergence test (>24 hours) finishes.
   - **Cause**: This is a random issue when some AI model convergence test execution finishes. It is not user-friendly as the model execution ends ungracefully.
   - **Solution**: Kill the process after the convergence test finished, or use checkpoints to divide the convergence test into several phases and execute separately.
-- **Problem**: Random instability issues such as page fault or atomic access violation when executing LLM inference workloads on Intel® Data Center GPU Max series cards.
-  - **Cause**:  This issue is reported on LTS driver [803.29](https://dgpu-docs.intel.com/releases/LTS_803.29_20240131.html). The root cause is under investigation.
-  - **Solution**: Use active rolling stable release driver [775.20](https://dgpu-docs.intel.com/releases/stable_775_20_20231219.html) or latest driver version to workaround.
+- **Problem**: Runtime error `munmap_chunk(): invalid pointer` when executing some scaling LLM workloads on Intel® Data Center GPU Max Series platform
+  - **Cause**: Users targeting GPU use, must set the environment variable ‘FI_HMEM=system’ to disable GPU support in underlying libfabric as Intel® MPI Library 2021.13.1 will offload the GPU support instead. This avoids a potential bug in libfabric GPU initialization.
+   - **Solution**: Set the environment variable ‘FI_HMEM=system’ to workaround this issue when encounter.
+
 
 ## Library Dependencies
 
diff --git a/docs/tutorials/releases.md b/docs/tutorials/releases.md
@@ -1,6 +1,38 @@
 Releases
 =============
 
+## 2.1.40+xpu
+
+Intel® Extension for PyTorch\* v2.1.40+xpu is a minor release which supports Intel® GPU platforms (Intel® Data Center GPU Flex Series, Intel® Data Center GPU Max Series，Intel® Arc™ A-Series Graphics and Intel® Core™ Ultra Processors with Intel® Arc™ Graphics) based on PyTorch\* 2.1.0.
+
+### Highlights
+
+- Intel® oneAPI Base Toolkit 2024.2.1 compatibility
+- Intel® oneDNN v3.5 integration
+- Intel® oneCCL 2021.13.1 integration
+- Intel® Core™ Ultra Processors with Intel® Arc™ Graphics (MTL-H) support on Windows (Prototype)
+- Bug fixing and other optimization
+  - Fix host memory leak [#4280](https://github.com/intel/intel-extension-for-pytorch/commit/5c252a1e34ccecc8e2e5d10ccc67f410ac7b87e2)
+  - Fix LayerNorm issue for undefined grad_input [#4317](https://github.com/intel/intel-extension-for-pytorch/commit/619cd9f5c300a876455411bcacc470bd94c923be)
+  - Replace FP64 device check method [#4354](https://github.com/intel/intel-extension-for-pytorch/commit/d60d45187b1dd891ec8aa2abc42eca8eda5cb242)
+  - Fix online doc search issue [#4358](https://github.com/intel/intel-extension-for-pytorch/commit/2e957315fdad776617e24a3222afa55f54b51507)
+  - Fix pdist unit test failure on client GPUs [#4361](https://github.com/intel/intel-extension-for-pytorch/commit/00f94497a94cf6d69ebba33ff95d8ab39113ecf4)
+  - Remove primitive cache from conv fwd [#4429](https://github.com/intel/intel-extension-for-pytorch/commit/bb1c6e92d4d11faac5b6fc01b226d27950b86579)
+  - Fix sdp bwd page fault with no grad bias [#4439](https://github.com/intel/intel-extension-for-pytorch/commit/d015f00011ad426af33bb970451331321417bcdb)  
+  - Fix implicit data conversion [#4463](https://github.com/intel/intel-extension-for-pytorch/commit/d6987649e58af0da4964175aed3286aef16c78c9)
+  - Fix compiler version parsing issue [#4468](https://github.com/intel/intel-extension-for-pytorch/commit/50b2b5933b6df6632a18d76bdec46b638750dc48)  
+  - Fix irfft invalid descriptor [#4480](https://github.com/intel/intel-extension-for-pytorch/commit/3e60e87cf011b643cc0e72d82c10b28417061d97)
+  - Change condition order to fix out-of-bound access in index [#4495](https://github.com/intel/intel-extension-for-pytorch/commit/8b74d6c5371ed0bd442279be42b0d454cb2b31b3)
+  - Add parameter check in embedding bag [#4504](https://github.com/intel/intel-extension-for-pytorch/commit/57174797bab9de2647abb8fdbcda638b0c694e01)
+  - Add the backward implementation for rms norm [#4527](https://github.com/intel/intel-extension-for-pytorch/commit/e4938e0a9cee15ffe2f8d205e0228c1842a5735c)
+  - Fix attn_mask for sdpa beam_search [#4557](https://github.com/intel/intel-extension-for-pytorch/commit/80ed47655b003fa132ac264b3d3008c298865473)
+  - Use data_ptr template instead of force data conversion [#4558](https://github.com/intel/intel-extension-for-pytorch/commit/eeb92d2f4c34f143fc76e409987543d42e68d065)
+  - Workaround windows AOT image size over 2GB issue on Intel® Core™ Ultra Processors with Intel® Arc™ Graphics [#4407](https://github.com/intel/intel-extension-for-pytorch/commit/d7ebba7c94374bdd12883ffd45d6670b96029d11) [#4450](https://github.com/intel/intel-extension-for-pytorch/commit/550fd767b723bd9a1a799b05be5d8ce073e6faf7)
+  
+### Known Issues
+
+Please refer to [Known Issues webpage](./known_issues.md).
+
 ## 2.1.30+xpu
 
 Intel® Extension for PyTorch\* v2.1.30+xpu is an update release which supports Intel® GPU platforms (Intel® Data Center GPU Flex Series, Intel® Data Center GPU Max Series and Intel® Arc™ A-Series Graphics) based on PyTorch\* 2.1.0.
diff --git a/docs/tutorials/technical_details.rst b/docs/tutorials/technical_details.rst
@@ -17,7 +17,7 @@ Optimizers are a key part of the training workloads. Intel® Extension for PyTor
    technical_details/optimizer_fusion_gpu
  
 
-For more detailed information, check `Optimizer Fusion on CPU <technical_details/optimizer_fusion_cpu.md>`_, `Optimizer Fusion on GPU <technical_details/optimizer_fusion_gpu.md>`_ and `Split SGD <technical_details/split_sgd.html>`_.
+For more detailed information, check `Optimizer Fusion on GPU <technical_details/optimizer_fusion_gpu.md>`_.
 
 Ahead of Time Compilation (AOT) [GPU]
 -------------------------------------
diff --git a/scripts/build_doc.sh b/scripts/build_doc.sh
@@ -221,7 +221,7 @@ elif [[ ${DEVICE} == "gpu" ]]; then
     parse_example "../examples/gpu/inference/cpp/example-usm/CMakeLists.txt" ${MDEXAMPLE} "(marker_cppsdk_cmake_usm)" "cmake"
 
     cp ${MDCONF} tutorials/features/advanced_configuration.md.bk
-    sed -i "/^| [[:alnum:]_-]/d" ${MDCONF}
+    #sed -i "/^| [[:alnum:]_-]/d" ${MDCONF}
     parse_build_options "../cmake/gpu/Options.cmake" ${MDCONF}
     parse_launch_options "../csrc/gpu/utils/Settings.cpp" ${MDCONF} "==========ALL=========="
     parse_launch_options "../csrc/gpu/utils/Settings.cpp" ${MDCONF} "==========GPU=========="
diff --git a/scripts/compile_bundle.sh b/scripts/compile_bundle.sh
@@ -338,8 +338,8 @@ if [ $((${MODE} & 0x02)) -ne 0 ]; then
     CMD="${CMD} import torchaudio; print(f'torchaudio_version:  {torchaudio.__version__}');"
 fi
 CMD="${CMD} import intel_extension_for_pytorch as ipex; print(f'ipex_version:        {ipex.__version__}');"
-if [ $((${MODE} & 0x01)) -ne 0 ]; then
-    CMD="${CMD} import oneccl_bindings_for_pytorch as torch_ccl; print(f'torchccl_version:    {torch_ccl.__version__}');"
-fi
+#if [ $((${MODE} & 0x01)) -ne 0 ]; then
+#    CMD="${CMD} import oneccl_bindings_for_pytorch as torch_ccl; print(f'torchccl_version:    {torch_ccl.__version__}');"
+#fi
 python -c "${CMD}"