目标检测模型离线量化示例 (#1852)

lizexu123 · web-flow · commit cf494d983d48 · 2024-02-26T10:10:55.000+08:00
diff --git a/example/post_training_quantization/detection/README.md b/example/post_training_quantization/detection/README.md
@@ -16,36 +16,37 @@
 
 ## 1. 简介
 本示例将以目标检测模型PP-YOLOE和PicoDet为例，介绍如何使用PaddleDetection中Inference部署模型，使用离线量化功能进行压缩，并使用敏感度分析功能提升离线量化精度。
-
+注意:[Paddle-Inference-demo/c++/gpu/yolov3](https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/python/gpu/yolov3)使用量化校准表会有精度不对齐的情况，可对yolov3_r50vd_dcn_270e_coco模型进行离线量化。
 
 ## 2.Benchmark
 
 | 模型  |  策略  | 输入尺寸 | mAP<sup>val<br>0.5:0.95 | 预测时延<sup><small>FP32</small><sup><br><sup>(ms) |预测时延<sup><small>FP16</small><sup><br><sup>(ms) | 预测时延<sup><small>INT8</small><sup><br><sup>(ms) |  配置文件 | Inference模型  |
 | :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
-| PP-YOLOE-s |  Base模型 | 640*640  |  43.1   |   11.2ms  |   7.7ms   |    -    |    -   | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_crn_s_300e_coco.tar) |
-| PP-YOLOE-s |  离线量化 | 640*640  |  42.6    |     -     |     -     |  6.7ms  |    -   |   [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_s_ptq.tar) |
+| yolov3_r50vd_dcn_270e_coco |  Base模型 | 608*608  |  40.6   |   92.2ms  |   41.3ms   |    -    |    -   | [Model](https://paddle-inference-dist.bj.bcebos.com/Paddle-Inference-Demo/yolov3_r50vd_dcn_270e_coco.tgz) |
+| yolov3_r50vd_dcn_270e_coco |  离线量化 | 608*608  |  40.3    |     -     |     -     |  27.9ms  |    -   |    |
 |  |  |  |  |  |  |  |  |  |
-| PicoDet-s |  Base模型 | 416*416  |  32.5   |   -  |   -   |  -  |  - | [Model](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet.tar) |
-| PicoDet-s |  离线量化(量化分析前) | 416*416  |  0.0   |   - |   -   |  -  |  -  | - |
-| PicoDet-s |  离线量化(量化分析后) | 416*416  |  24.9   |   - |   -   |  -  |  -  | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/picodet_s_ptq.tar) |
+| PicoDet-s |  Base模型 | 416*416  |  32.5   |   82.5ms  |   59.7ms   |  -  |  - | [Model](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet.tar) |
+| PicoDet-s |  离线量化(量化分析前) | 416*416  |  0.0   |   - |   -   |  39.1ms  |  -  | - |
+| PicoDet-s |  离线量化(量化分析后) | 416*416  |  24.9   |   - |   -   |  64.8ms  |  -  | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/picodet_s_ptq.tar) |
 
+- mAP较低，导致目标框增多，NMS会增加耗时。
 - mAP的指标均在COCO val2017数据集中评测得到，IoU=0.5:0.95。
-
+测速环境:Tesla T4,TensorRT 8.6.1,CUDA 11.2,batch_size=1,cudnn 8.2.0  Intel(R)Xeon(R)Gold 6271C CPU,测速脚本[paddle_inference_eval.py](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/paddle_inference_eval.py)
 
 ## 3. 离线量化流程
 
 #### 3.1 准备环境
-- PaddlePaddle >= 2.3 （可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装）
-- PaddleSlim >= 2.3
+- PaddlePaddle  2.6 （可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装）
+- PaddleSlim 2.6
 - PaddleDet >= 2.4
 - opencv-python
 
 安装paddlepaddle：
 ```shell
 # CPU
-pip install paddlepaddle
-# GPU
-pip install paddlepaddle-gpu
+python -m pip install paddlepaddle==2.6.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
+# GPU 以cuda11.2为例子
+python -m pip install paddlepaddle-gpu==2.6.0.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
 ```
 
 安装paddleslim：
@@ -103,11 +104,11 @@ python tools/export_model.py -c configs/picodet/picodet_s_416_coco_lcnet.yml \
 
 离线量化示例通过post_quant.py脚本启动，会使用接口```paddleslim.quant.quant_post_static```对模型进行量化。配置config文件中模型路径、数据路径和量化相关的参数，配置完成后便可对模型进行离线量化。具体运行命令为：
 
-- PPYOLOE-s：
+- yolov3_r50vd_dcn_270e_coco：
 
 ```
 export CUDA_VISIBLE_DEVICES=0
-python post_quant.py --config_path=./configs/ppyoloe_s_ptq.yaml --save_dir=./ppyoloe_s_ptq
+python post_quant.py --config_path=./configs/yolov3_r50vd_dcn.yaml --save_dir=./yolov3_r50vd_dcn_270e_coco_ptq
 ```
 
 - PicoDet-s：
@@ -121,16 +122,21 @@ python post_quant.py --config_path=./configs/picodet_s_ptq.yaml --save_dir=./pic
 #### 3.5 测试模型精度
 
 使用eval.py脚本得到模型的mAP：
+ppyoloe_s这个模型测试不出来精度，因为没有NMS
+```
+export CUDA_VISIBLE_DEVICES=0
+python eval.py --config_path=./configs/picodet_s_ptq.yaml
+```
 ```
 export CUDA_VISIBLE_DEVICES=0
-python eval.py --config_path=./configs/ppyoloe_s_ptq.yaml
+python eval.py --config_path=./configs/yolov3_r50vd_dcn.yaml
 ```
 
 **注意**：
 - 要测试的模型路径可以在配置文件中`model_dir`字段下进行修改。
 
 #### 3.6 提高离线量化精度
-本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据，且使用简单、能快速得到量化模型，但往往会造成较大的精度损失。PaddleSlim提供量化分析工具，会使用接口```paddleslim.quant.AnalysisPTQ```，可视化展示出不适合量化的层，通过跳过这些层，提高离线量化模型精度。```paddleslim.quant.AnalysisPTQ```详解见[AnalysisPTQ.md](../../../docs/zh_cn/tutorials/quant/AnalysisPTQ.md)。
+本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据，且使用简单、能快速得到量化模型，但往往会造成较大的精度损失。PaddleSlim提供量化分析工具，会使用接口```paddleslim.quant.AnalysisPTQ```，可视化展示出不适合量化的层，通过跳过这些层，提高离线量化模型精度。```paddleslim.quant.AnalysisPTQ```详解见[AnalysisPTQ.md](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/quant/post_training_quantization.md)。
 
 
 经过多个实验，包括尝试多种激活算法（avg，KL等）、weight的量化方式（abs_max，channel_wise_abs_max），对PicoDet-s进行离线量化后精度均为0，以PicoDet-s为例，量化分析工具具体使用方法如下：
@@ -171,6 +177,141 @@ python post_quant.py --config_path=./configs/picodet_s_analyzed_ptq.yaml --save_
 ## 4.预测部署
 预测部署可参考[Detection模型自动压缩示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression/detection)
 
+
+量化模型可在GPU上可以使用TensorRT进行预测，在CPU上可以使用MKLDNN进行预测。
+
+以下字段可用于配置预测参数:
+
+| 参数名 | 含义 |
+|:------:|:------:|
+| model_path | inference 模型文件所在目录，该目录下需要有文件 model.pdmodel 和 model.pdiparams 两个文件 |
+| reader_config | eval时模型reader的配置文件路径 |
+| image_file | 如果只测试单张图片效果，直接根据image_file指定图片路径 |
+| device | 使用GPU或者CPU预测，可选CPU/GPU   |
+| use_trt | 是否使用 TesorRT 预测引擎   |
+| use_mkldnn | 是否启用```MKL-DNN```加速库，注意```use_mkldnn```与```use_gpu```同时为```True```时，将忽略```enable_mkldnn```，而使用```GPU```预测  |
+| cpu_threads | CPU预测时，使用CPU线程数量，默认10  |
+| precision | 预测精度，包括`fp32/fp16/int8`  |
+| include_nms | 是否包含nms，如果不包含nms，则设置False，如果包含nms，则设置为True  |
+| use_dynamic_shape | 是否使用动态shape，如果使用动态shape，则设置为True，否则设置为False  |
+| img_shape | 输入图片的大小。这里默认为640,意味着图像将被调整到640*640  |
+| trt_calib_mode | 如果模型是通过TensorRT离线量化校准生成的，那么需要将此参数设置为True。|
+
+-Paddle-TesorRT预测示例：
+
+yolov3_r50vd_dcn_270e_coco模型
+```shell
+python paddle_inference_eval.py \
+       --model_path=yolov3_r50vd_dcn_270e_coco \
+       --reader_config=configs/yolov3_r50vd_dcn.yml \
+       --use_trt=True \
+       --precision=fp32 \
+       --include_nms=True \
+       --benchmark=True
+```
+```shell
+python paddle_inference_eval.py \
+       --model_path=yolov3_r50vd_dcn_270e_coco_ptq \
+       --reader_config=configs/yolov3_r50vd_dcn.yml \
+       --use_trt=True \
+       --precision=int8 \
+       --include_nms=True \
+       --benchmark=True
+```
+picodet_s模型
+```shell
+python paddle_inference_eval.py \
+    --model_path=picodet_s_416_coco_lcnet \
+    --reader_config=configs/picodet_reader.yml \
+    --use_trt=True \
+    --precision=fp16 \
+    --include_nms=True \
+    --benchmark=True
+```
+量化分析前
+```shell
+python paddle_inference_eval.py \
+    --model_path=picodet_s_ptq \
+    --reader_config=configs/picodet_reader.yml \
+    --use_trt=True \
+    --precision= \
+    --include_nms=True \
+    --benchmark=True
+```
+量化分析后
+```shell
+python paddle_inference_eval.py \
+    --model_path=picodet_s_analyzed_ptq_out \
+    --reader_config=configs/picodet_reader.yml \
+    --use_trt=True \
+    --precision=int8 \
+    --include_nms=True \
+    --benchmark=True
+```
+#### 4.1 C++部署
+请参考[YOLOv3推理](https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/c%2B%2B/gpu/yolov3)
+
+编译样例
+- 文件yolov3_test.cc改成PicoDet-s.cc，为预测的样例程序(程序中的输入为固定值，如果您有opencv或其他方式进行数据读取的需求，需要对程序进行一定的修改)。
+- 脚本compile.sh包含了第三方库、预编译库的信息配置。
+- 脚本run.sh为一键运行脚本。
+编译前，需要根据自己的环境修改compile.sh中的相关代码配置依赖库:
+
+```shell
+# 编译的 demo 名称
+DEMO_NAME=picoDet-s
+
+# 根据预编译库中的version.txt信息判断是否将以下三个标记打开
+WITH_MKL=ON
+WITH_GPU=ON
+USE_TENSORRT=ON
+
+# 配置预测库的根目录
+LIB_DIR=${work_path}/../lib/paddle_inference
+
+# 如果上述的WITH_GPU 或 USE_TENSORRT设为ON，请设置对应的CUDA， CUDNN， TENSORRT的路径。
+CUDNN_LIB=/usr/lib/x86_64-linux-gnu/
+CUDA_LIB=/usr/local/cuda/lib64
+TENSORRT_ROOT=/usr/local/TensorRT-7.1.3.4
+```
+运行bash compile.sh编译样例
+
+- 运行样例
+使用原生GPU运行样例
+```shell
+./build/picodet-s --model_file picodet_s_416_coco_lenet/model.pdmodel --params_file picodet_s_416_coco_lenet/model.pdiparams
+```
+使用Trt FP32运行样例
+```shell
+./build/picodet-s --model_file picodet_s_416_coco_lenet/model.pdmodel --params_file picodet_s_416_coco_lenet/model.pdiparams --run_mode=trt_fp32
+```
+使用Trt FP16运行样例
+```shell
+./build/picodet-s --model_file picodet_s_416_coco_lenet/model.pdmodel --params_file picodet_s_416_coco_lenet/model.pdiparams --run_mode=trt_fp16
+```
+使用Trt Int8运行样例
+在使用Trt Int8运行样例时，相同的运行命令需要执行两次。
+生成量化校准表
+```shell
+./build/picodet-s --model_file picodet_s_416_coco_lcnet/model.pdmodel --params_file picodet_s_416_coco_lcnet/model.pdiparams --run_mode=trt_int8
+```
+加载校准表预测的log:
+```shell
+I0623 08:40:49.386909 107053 tensorrt_engine_op.h:159] This process is generating calibration table for Paddle TRT int8...
+I0623 08:40:49.387279 107057 tensorrt_engine_op.h:352] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
+I0623 08:41:13.784473 107053 analysis_predictor.cc:791] Wait for calib threads done.
+I0623 08:41:14.419198 107053 analysis_predictor.cc:793] Generating TRT Calibration table data, this may cost a lot of time...
+```
+使用Trt dynamic shape运行样例(以Trt FP32为例)
+```shell
+./build/picodet-s --model_file picodet_s_416_coco_lcnet/model.pdmodel --params_file picodet_s_416_coco_lcnet/model.pdiparams --run_mode=trt_fp32 --use_dynamic_shape=1
+```
+| 模型 | trt-fp32 | trt-fp16 | trt-int8 | paddle_gpu fp32 | trt_fp32(dynamic_shape) |
+|:------:|:------:|:------:|:------:| :------:| :------:|
+| PicoDet-s | 3.05ms | 2.66ms | 2.40ms | 7.51ms | 2.82ms |
+
+- 测速环境:Tesla T4,TensorRT 8.6.1,CUDA 11.6,batch_size=1,cudnn 8.4.0  Intel(R)Xeon(R)Gold 6271C CPU
+
 ## 5.FAQ
 
 - 如果想对模型进行自动压缩，可进入[Detection模型自动压缩示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression/detection)中进行实验。
diff --git a/example/post_training_quantization/detection/configs/picodet_s_analysis.yaml b/example/post_training_quantization/detection/configs/picodet_s_analysis.yaml
@@ -1,5 +1,5 @@
 input_list: ['image', 'scale_factor']
-model_dir: ./picodet_s_416_coco_lcnet/
+model_dir: ./picodet_s_416_coco_lcnet
 model_filename: model.pdmodel
 params_filename: model.pdiparams
 save_dir: ./analysis_results
@@ -26,11 +26,11 @@ EvalDataset:
 
 # Small Dataset to accelerate analysis
 # If not exist, delete the dict of FastEvalDataset
-FastEvalDataset:
-  !COCODataSet
-    image_dir: val2017
-    anno_path: annotations/small_instances_val2017.json
-    dataset_dir: /dataset/coco/
+# FastEvalDataset:
+#   !COCODataSet
+#     image_dir: val2017
+#     anno_path: annotations/small_instances_val2017.json
+#     dataset_dir: /dataset/coco/
 
 
 eval_height: &eval_height 416
diff --git a/example/post_training_quantization/detection/configs/ppyoloe_s_ptq.yaml b/example/post_training_quantization/detection/configs/ppyoloe_s_ptq.yaml
@@ -1,4 +1,4 @@
-input_list: ['image']
+input_list: ['image','scale_factor']
 arch: PPYOLOE    # When export exclude_nms=True, need set arch: PPYOLOE
 model_dir: ./ppyoloe_crn_s_300e_coco
 model_filename: model.pdmodel
@@ -29,4 +29,4 @@ EvalReader:
     - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
     - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
     - Permute: {}
-  batch_size: 32
+  batch_size: 16
diff --git a/example/post_training_quantization/detection/configs/yolov3_r50vd_dcn.yaml b/example/post_training_quantization/detection/configs/yolov3_r50vd_dcn.yaml
@@ -0,0 +1,36 @@
+input_list: ['image', 'scale_factor','im_shape']
+model_dir: ./yolov3_r50vd_dcn_270e_coco
+model_filename: model.pdmodel
+params_filename: model.pdiparams
+metric: COCO
+num_classes: 80
+
+# Datset configuration
+TrainDataset:
+  !COCODataSet
+    image_dir: train2017
+    anno_path: annotations/instances_train2017.json
+    dataset_dir: /work/GETR-Lite-paddle-new/inference/datasets/coco/
+
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: /work/GETR-Lite-paddle-new/inference/datasets/coco/
+
+eval_height: &eval_height 608
+eval_width: &eval_width
+eval_size: &eval_size [*eval_height, *eval_width]
+
+worker_num: 0
+
+# preprocess reader in test
+EvalReader:
+  inputs_def:
+    image_shape: [1, 3, *eval_height, *eval_width]
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_size: 4