Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleSlim into develop

lizexu123 · lizexu123 · commit e61ce8d771e6 · 2024-02-07T06:28:21.000Z
diff --git a/example/auto_compression/nlp/README.md b/example/auto_compression/nlp/README.md
@@ -56,16 +56,16 @@
 
 #### 3.1 准备环境
 - python >= 3.6
-- PaddlePaddle >= 2.4 （可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装）
-- PaddleSlim >= 2.4
-- PaddleNLP >= 2.3
+- PaddlePaddle ==2.5 （可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装）
+- PaddleSlim ==2.5
+- PaddleNLP ==2.6
 
 安装paddlepaddle：
 ```shell
 # CPU
-pip install paddlepaddle==2.4.1
+pip install paddlepaddle==2.5.0
 # GPU 以Ubuntu、CUDA 11.2为例
-python -m pip install paddlepaddle-gpu==2.4.1.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
+python -m pip install paddlepaddle-gpu==2.5.0.post116 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
 ```
 
 安装paddleslim：
@@ -95,7 +95,6 @@ pip install paddlenlp
 |:------:|:------:|:------:|:------:|:------:|:-----------:|:------:|:------:|
 | PP-MiniLM | [afqmc](https://bj.bcebos.com/v1/paddle-slim-models/act/afqmc.tar) | [tnews](https://bj.bcebos.com/v1/paddle-slim-models/act/tnews.tar) | [iflytek](https://bj.bcebos.com/v1/paddle-slim-models/act/iflytek.tar) | [cmnli](https://bj.bcebos.com/v1/paddle-slim-models/act/cmnli.tar) | [ ocnli](https://bj.bcebos.com/v1/paddle-slim-models/act/ocnli.tar) | [cluewsc2020](https://bj.bcebos.com/v1/paddle-slim-models/act/cluewsc.tar) | [csl](https://bj.bcebos.com/v1/paddle-slim-models/act/csl.tar) |
 | ERNIE 3.0-Medium | [afqmc](https://bj.bcebos.com/v1/paddle-slim-models/act/NLP/ernie3.0-medium/fp32_models/AFQMC.tar) | [tnews](https://bj.bcebos.com/v1/paddle-slim-models/act/NLP/ernie3.0-medium/fp32_models/TNEWS.tar) | [iflytek](https://bj.bcebos.com/v1/paddle-slim-models/act/NLP/ernie3.0-medium/fp32_models/IFLYTEK.tar) | [cmnli](https://bj.bcebos.com/v1/paddle-slim-models/act/NLP/ernie3.0-medium/fp32_models/CMNLI.tar) | [ocnli](https://bj.bcebos.com/v1/paddle-slim-models/act/NLP/ernie3.0-medium/fp32_models/OCNLI.tar) | [cluewsc2020](https://bj.bcebos.com/v1/paddle-slim-models/act/NLP/ernie3.0-medium/fp32_models/CLUEWSC2020.tar) | [csl](https://bj.bcebos.com/v1/paddle-slim-models/act/NLP/ernie3.0-medium/fp32_models/CSL.tar) |
-| UIE-base | [报销工单](https://bj.bcebos.com/v1/paddle-slim-models/act/uie_base.tar) |
 
 从上表获得模型超链接, 并用以下命令下载推理模型文件:
 
@@ -119,11 +118,6 @@ export CUDA_VISIBLE_DEVICES=0
 python run.py --config_path='./configs/pp-minilm/auto/afqmc.yaml' --save_dir='./save_afqmc_pruned/'
 ```
 
-自动压缩UIE系列模型需要使用 run_uie.py 脚本启动，会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行自动压缩。配置config文件中训练部分的参数，将任务名称、模型类型、数据集名称、压缩参数传入，配置完成后便可对模型进行蒸馏量化训练。
-```shell
-export CUDA_VISIBLE_DEVICES=0
-python run_uie.py --config_path='./configs/uie/uie_base.yaml' --save_dir='./save_uie_qat/'
-```
 
 如仅需验证模型精度，或验证压缩之后模型精度，在启动```run.py```脚本时，将配置文件中模型文件夹 ```model_dir``` 改为压缩之后保存的文件夹路径 ```./save_afqmc_pruned``` ，命令加上```--eval True```即可：
 ```shell
@@ -217,8 +211,6 @@ QuantPost:
 
 - TensorRT预测：
 
-环境配置：如果使用 TesorRT 预测引擎，需安装 ```WITH_TRT=ON``` 的Paddle，下载地址：[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)
-
 首先下载量化好的模型：
 ```shell
 wget https://bj.bcebos.com/v1/paddle-slim-models/act/save_ppminilm_afqmc_new_calib.tar
@@ -227,10 +219,30 @@ tar -xf save_ppminilm_afqmc_new_calib.tar
 
 ```shell
 python paddle_inference_eval.py \
-      --model_path=save_ernie3_afqmc_new_cablib \
+      --model_path=save_ppminilm_afqmc_new_calib \
+      --model_filename=inference.pdmodel \
+      --params_filename=inference.pdiparams \
+      --task_name='afqmc' \
+      --use_trt \
+      --precision=int8
+```
+
+- ERNIE 3.0-Medium:
+```shell
+python paddle_inference_eval.py \
+      --model_path=TNEWS \
       --model_filename=infer.pdmodel \
       --params_filename=infer.pdiparams \
-      --task_name='afqmc' \
+      --task_name='tnews' \
+      --use_trt \
+      --precision=fp32
+```
+```shell
+python paddle_inference_eval.py \
+      --model_path=save_tnews_pruned \
+      --model_filename=infer.pdmodel \
+      --params_filename=infer.pdiparams \
+      --task_name='tnews' \
       --use_trt \
       --precision=int8
 ```
@@ -239,9 +251,9 @@ python paddle_inference_eval.py \
 
 ```shell
 python paddle_inference_eval.py \
-      --model_path=save_ernie3_afqmc_new_cablib \
-      --model_filename=infer.pdmodel \
-      --params_filename=infer.pdiparams \
+      --model_path=save_ppminilm_afqmc_new_calib \
+      --model_filename=inference.pdmodel \
+      --params_filename=inference.pdiparams \
       --task_name='afqmc' \
       --device=cpu \
       --use_mkldnn=True \
diff --git a/example/auto_compression/nlp/configs/ernie3.0/tnews.yaml b/example/auto_compression/nlp/configs/ernie3.0/tnews.yaml
@@ -6,12 +6,17 @@ Global:
   dataset: clue
   batch_size: 16
   max_seq_length: 128
-TrainConfig:
-  epochs: 6
-  eval_iter: 1110
-  learning_rate: 2.0e-5
-  optimizer_builder:
-    optimizer: 
-      type: AdamW
-    weight_decay: 0.01
-  origin_metric: 0.5700
+
+# 剪枝
+Prune:
+  prune_algo: transformer_pruner
+  pruned_ratio: 0.25
+
+# 离线量化
+QuantPost:
+  activation_bits: 8
+  quantize_op_types:
+  - depthwise_conv2d
+  - conv2d
+  weight_bits: 8
+  
diff --git a/example/auto_compression/nlp/configs/pp-minilm/auto/afqmc.yaml b/example/auto_compression/nlp/configs/pp-minilm/auto/afqmc.yaml
@@ -6,17 +6,11 @@ Global:
   dataset: clue
   batch_size: 16
   max_seq_length: 128
-TransformerPrune:
-  pruned_ratio: 0.25
-HyperParameterOptimization:
-Distillation:
+
+#离线量化
 QuantPost:
-TrainConfig:
-  epochs: 6
-  eval_iter: 1070
-  learning_rate: 2.0e-5
-  optimizer_builder:
-    optimizer: 
-      type: AdamW
-    weight_decay: 0.01
-  origin_metric: 0.7403
+  activation_bits: 8
+  quantize_op_types: 
+  - conv2d
+  - depthwise_conv2d
+  weight_bits: 8
diff --git a/example/auto_compression/nlp/paddle_inference_eval.py b/example/auto_compression/nlp/paddle_inference_eval.py
@@ -91,7 +91,8 @@ def parse_args():
         "--max_seq_length",
         default=128,
         type=int,
-        help="The maximum total input sequence length after tokenization. Sequences longer "
+        help=
+        "The maximum total input sequence length after tokenization. Sequences longer "
         "than this will be truncated, sequences shorter will be padded.", )
     parser.add_argument(
         "--perf_warmup_steps",
@@ -107,7 +108,8 @@ def parse_args():
         type=str,
         default="fp32",
         choices=["fp32", "fp16", "int8"],
-        help="The precision of inference. It can be 'fp32', 'fp16' or 'int8'. Default is 'fp16'.",
+        help=
+        "The precision of inference. It can be 'fp32', 'fp16' or 'int8'. Default is 'fp16'.",
     )
     parser.add_argument(
         "--use_mkldnn",
@@ -156,8 +158,7 @@ def _convert_example(example,
             }
         elif "target" in example:  # wsc
             text, query, pronoun, query_idx, pronoun_idx = (
-                example["text"],
-                example["target"]["span1_text"],
+                example["text"], example["target"]["span1_text"],
                 example["target"]["span2_text"],
                 example["target"]["span1_index"],
                 example["target"]["span2_index"], )
@@ -209,6 +210,12 @@ def create_predictor(cls, args):
         config = paddle.inference.Config(
             os.path.join(args.model_path, args.model_filename),
             os.path.join(args.model_path, args.params_filename))
+        config.switch_ir_debug(True)
+        # 适用于ERNIE 3.0-Medium模型
+        # config.exp_disable_tensorrt_ops(["elementwise_add"])
+        # config.exp_disable_tensorrt_ops(["fused_embedding_eltwise_layernorm"])
+        # config.exp_disable_tensorrt_ops(["tmp_3"])
+
         if args.device == "gpu":
             # set GPU configs accordingly
             config.enable_use_gpu(100, 0)
@@ -239,8 +246,8 @@ def create_predictor(cls, args):
             dynamic_shape_file = os.path.join(args.model_path,
                                               "dynamic_shape.txt")
             if os.path.exists(dynamic_shape_file):
-                config.enable_tuned_tensorrt_dynamic_shape(dynamic_shape_file,
-                                                           True)
+                config.enable_tuned_tensorrt_dynamic_shape(
+                    dynamic_shape_file, True)
                 print("trt set dynamic shape done!")
             else:
                 config.collect_shape_range_info(dynamic_shape_file)
@@ -365,4 +372,4 @@ def main():
 
 if __name__ == "__main__":
     paddle.set_device("cpu")
-    main()
+    main()
diff --git a/example/post_training_quantization/pytorch_yolo_series/README.md b/example/post_training_quantization/pytorch_yolo_series/README.md
@@ -40,23 +40,25 @@
 ## 3. 离线量化流程
 
 #### 3.1 准备环境
-- PaddlePaddle >= 2.3 （可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装）
-- PaddleSlim > 2.3版本
+- PaddlePaddle ==2.5 （可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装）
+- PaddleSlim == 2.5
 - X2Paddle >= 1.3.9
 - opencv-python
 
 
 （1）安装paddlepaddle：
 ```shell
 # CPU
-pip install paddlepaddle
+python -m pip install paddlepaddle==2.5.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
 # GPU
-pip install paddlepaddle-gpu
+python -m pip install paddlepaddle-gpu==2.5.0.post116 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
 ```
 
 （2）安装paddleslim：
+注意，PaddleSlim这里setup.py需要更改 slim_version='2.5'
 ```shell
-pip install paddleslim
+git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleSlim.git & cd PaddleSlim
+python setup.py install
 ```
 
 #### 3.2 准备数据集
@@ -122,7 +124,7 @@ python eval.py --config_path=./configs/yolov5s_ptq.yaml
 #### 3.6 提高离线量化精度
 
 ###### 3.6.1 量化分析工具
-本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据，且使用简单、能快速得到量化模型，但往往会造成较大的精度损失。PaddleSlim提供量化分析工具，会使用接口```paddleslim.quant.AnalysisPTQ```，可视化展示出不适合量化的层，通过跳过这些层，提高离线量化模型精度。```paddleslim.quant.AnalysisPTQ```详解见[AnalysisPTQ.md](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/quant/post_training_quantization.md)。
+本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据，且使用简单、能快速得到量化模型，但往往会造成较大的精度损失。PaddleSlim提供量化分析工具，会使用接口```paddleslim.quant.AnalysisPTQ```，可视化展示出不适合量化的层，通过跳过这些层，提高离线量化模型精度。```paddleslim.quant.AnalysisPTQ```详解见[离线量化](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/quant/post_training_quantization.md)。
 
 
 由于YOLOv6离线量化效果较差，以YOLOv6为例，量化分析工具具体使用方法如下：
@@ -208,23 +210,24 @@ python fine_tune.py --config_path=./configs/yolov6s_fine_tune.yaml --simulate_ac
 ## 4.预测部署
 预测部署可参考[YOLO系列模型自动压缩示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression/pytorch_yolo_series)
 量化模型在GPU上可以使用TensorRT进行加速，在CPU上可以使用MKLDNN进行加速。
-| 参数名 |  含义  |
-| model_path | inference模型文件所在路径，该目录下需要有文件model.pdmodel和params.pdiparams两个文件 |
+| 参数名 | 含义 |
+|:------:|:------:|
+| model_path | inference 模型文件所在目录，该目录下需要有文件 model.pdmodel 和 model.pdiparams 两个文件 |
 | dataset_dir | 指定COCO数据集的目录，这是存储数据集的根目录 |
 | image_file | 如果只测试单张图片效果，直接根据image_file指定图片路径 |
 | val_image_dir | COCO数据集中验证图像的目录名，默认为val2017 |
 | val_anno_path | 指定COCO数据集的注释(annotation)文件路径，这是包含验证集标注信息的JSON文件，默认为annotations/instances_val2017.json |
 | benchmark | 指定是否运行性能基准测试。如果设置为True，程序将会进行性能测试 |
-| device | 使用GPU或者CPU预测，可选CPU/GPU/XPU，默认设置为GPU |
-| use_trt | 是否使用TensorRT进行预测|
-| use_mkldnn | 是否使用MKL-DNN加速库，注意use_mkldnn与use_gpu同时为True时,将忽略enable_mkldnn,而使用GPU预测|
-| use_dynamic_shape | 是否使用动态形状(dynamic_shape)功能 |
-| precision | fp32/fp16/int8|
+| device | 使用GPU或者CPU预测，可选CPU/GPU/XPU，默认设置为GPU   |
+| use_trt | 是否使用 TesorRT 预测引擎   |
+| use_mkldnn | 是否启用```MKL-DNN```加速库，注意```use_mkldnn```与```use_gpu```同时为```True```时，将忽略```enable_mkldnn```，而使用```GPU```预测  |
+| cpu_threads | CPU预测时，使用CPU线程数量，默认10  |
+| precision | 预测精度，包括`fp32/fp16/int8` |
 | arch | 指定所使用的模型架构的名称，例如YOLOv5 |
 | img_shape | 指定模型输入的图像尺寸 |
+| use_dynamic_shape | 是否使用动态shape，如果使用动态shape，则设置为True，否则设置为False  |
 | batch_size | 指定模型输入的批处理大小 |
-| use_mkldnn | 指定是否使用MKLDNN加速(主要针对CPU)|
-| cpu_threads | 指定在CPU上使用的线程数 |
+
 
 首先，我们拥有的yolov6.onnx，我们需要把ONNX模型转成paddle模型，具体参考使用[X2Paddle迁移推理模型](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/model_convert/convert_with_x2paddle_cn.html#x2paddle)
 - 安装X2Paddle
@@ -242,7 +245,7 @@ python setup.py install
 ```shell
 x2paddle --framework=onnx --model=yolov6s.onnx --save_dir=yolov6_model
 ```
-- TensorRT Python部署
+#### 4.1 TensorRT Python部署
 使用[paddle_inference_eval.py](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/pytorch_yolo_series/paddle_inference_eval.py)部署
 ```shell
 python paddle_inference_eval.py --model_path=yolov6_model/inference_model --dataset_dir=datasets/coco --use_trt=True --precision=fp32 --arch=YOLOv6
@@ -251,7 +254,11 @@ python paddle_inference_eval.py --model_path=yolov6_model/inference_model --data
 ```shell
 python paddle_inference_eval.py --model_path=yolov6s_ptq_out --dataset_dir==datasets/coco --use_trt=True --precision=int8 --arch=YOLOv6
 ```
-- C++部署
+#### 4.2 MKLDNN Python部署
+```shell
+python paddle_inference_eval.py --model_path=yolov6_model/inference_model --dataset_dir=/work/GETR-Lite-paddle-new/inference/datasets/coco --device=CPU --use_mkldnn=True --precision=fp32 --arch=YOLOv6
+```
+#### 4.3 C++部署
 具体可参考[运行PP-YOLOE-l目标检测模型样例](https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/c%2B%2B/gpu/ppyoloe_crn_l)
 将compile.sh中DEMO_NAME修改为yolov6_test，并且将ppyoloe_crn_l.cc修改为yolov6_test.cc,根据环境修改相关配置库
 运行bash compile.sh编译样例。
@@ -272,5 +279,6 @@ python paddle_inference_eval.py --model_path=yolov6s_ptq_out --dataset_dir==data
 ```shell
 ./build/yolov6_test --model_file yolov6s_infer/model.pdmodel --params_file yolov6s_infer/model.pdiparams --run_mode=trt_int8
 ```
+
 ## 5.FAQ
 - 如果想对模型进行自动压缩，可进入[YOLO系列模型自动压缩示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression/pytorch_yolo_series)中进行实验。