Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

目标检测模型离线量化示例 #1845

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions example/auto_compression/detection/configs/yolov3_r50vd_dcn.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
metric: COCO
num_classes: 80

# Datset configuration
TrainDataset:
!COCODataSet
image_dir: train2017
anno_path: annotations/instances_train2017.json
dataset_dir: /work/GETR-Lite-paddle-new/inference/datasets/coco/
EvalDataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: /work/GETR-Lite-paddle-new/inference/datasets/coco/

eval_height: &eval_height 608
eval_width: &eval_width 608
eval_size: &eval_size [*eval_height, *eval_width]

worker_num: 0

EvalReader:
inputs_def:
image_shape: [1, 3, *eval_height, *eval_width]
sample_transforms:
- Decode: {}
- Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_size: 4
174 changes: 162 additions & 12 deletions example/post_training_quantization/detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,35 +17,37 @@
## 1. 简介
本示例将以目标检测模型PP-YOLOE和PicoDet为例,介绍如何使用PaddleDetection中Inference部署模型,使用离线量化功能进行压缩,并使用敏感度分析功能提升离线量化精度。

注意:[Paddle-Inference-demo/c++/gpu/yolov3](https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/python/gpu/yolov3)使用量化校准表会有精度不对齐的情况,可对yolov3_r50vd_dcn_270e_coco模型进行离线量化。

## 2.Benchmark

| 模型 | 策略 | 输入尺寸 | mAP<sup>val<br>0.5:0.95 | 预测时延<sup><small>FP32</small><sup><br><sup>(ms) |预测时延<sup><small>FP16</small><sup><br><sup>(ms) | 预测时延<sup><small>INT8</small><sup><br><sup>(ms) | 配置文件 | Inference模型 |
| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
| PP-YOLOE-s | Base模型 | 640*640 | 43.1 | 11.2ms | 7.7ms | - | - | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_crn_s_300e_coco.tar) |
| PP-YOLOE-s | 离线量化 | 640*640 | 42.6 | - | - | 6.7ms | - | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_s_ptq.tar) |
| yolov3_r50vd_dcn_270e_coco | Base模型 | 608*608 | 40.6 | 92.2ms | 41.3ms | - | - | [Model](https://paddle-inference-dist.bj.bcebos.com/Paddle-Inference-Demo/yolov3_r50vd_dcn_270e_coco.tgz) |
| yolov3_r50vd_dcn_270e_coco | 离线量化 | 608*608 | 40.3 | - | - | 27.9ms | - | |
| | | | | | | | | |
| PicoDet-s | Base模型 | 416*416 | 32.5 | - | - | - | - | [Model](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet.tar) |
| PicoDet-s | 离线量化(量化分析前) | 416*416 | 0.0 | - | - | - | - | - |
| PicoDet-s | 离线量化(量化分析后) | 416*416 | 24.9 | - | - | - | - | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/picodet_s_ptq.tar) |
| PicoDet-s | Base模型 | 416*416 | 32.5 | 82.5ms | 59.7ms | - | - | [Model](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet.tar) |
| PicoDet-s | 离线量化(量化分析前) | 416*416 | 0.0 | - | - | 39.1ms | - | - |
| PicoDet-s | 离线量化(量化分析后) | 416*416 | 24.9 | - | - | 64.8ms | - | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/picodet_s_ptq.tar) |

mAP较低,导致目标框增多,NMS会增加耗时。
- mAP的指标均在COCO val2017数据集中评测得到,IoU=0.5:0.95。

测速环境:Tesla T4,TensorRT 8.6.1,CUDA 11.2,batch_size=1,cudnn 8.2.0 Intel(R)Xeon(R)Gold 6271C CPU

## 3. 离线量化流程

#### 3.1 准备环境
- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装)
- PaddleSlim >= 2.3
- PaddlePaddle == 2.6 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装)
- PaddleSlim 2.6
- PaddleDet >= 2.4
- opencv-python

安装paddlepaddle:
```shell
# CPU
pip install paddlepaddle
# GPU
pip install paddlepaddle-gpu
python -m pip install paddlepaddle==2.6.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
# GPU 以cuda11.2为例子
python -m pip install paddlepaddle-gpu==2.6.0.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
```

安装paddleslim:
Expand Down Expand Up @@ -116,6 +118,12 @@ python post_quant.py --config_path=./configs/ppyoloe_s_ptq.yaml --save_dir=./ppy
export CUDA_VISIBLE_DEVICES=0
python post_quant.py --config_path=./configs/picodet_s_ptq.yaml --save_dir=./picodet_s_ptq
```
- yolov3_r50vd_dcn_270e_coco:

```
export CUDA_VISIBLE_DEVICES=0
python post_quant.py --config_path=./configs/yolov3_r50vd_dcn.yaml --save_dir=./yolov3_r50vd_dcn_270e_coco_ptq
```


#### 3.5 测试模型精度
Expand All @@ -125,12 +133,21 @@ python post_quant.py --config_path=./configs/picodet_s_ptq.yaml --save_dir=./pic
export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path=./configs/ppyoloe_s_ptq.yaml
```
ppyoloe_s这个模型测试不出来精度,因为没有NMS
```
export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path=./configs/picodet_s_ptq.yaml
```
```
export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path=./configs/yolov3_r50vd_dcn.yaml
```

**注意**:
- 要测试的模型路径可以在配置文件中`model_dir`字段下进行修改。

#### 3.6 提高离线量化精度
本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据,且使用简单、能快速得到量化模型,但往往会造成较大的精度损失。PaddleSlim提供量化分析工具,会使用接口```paddleslim.quant.AnalysisPTQ```,可视化展示出不适合量化的层,通过跳过这些层,提高离线量化模型精度。```paddleslim.quant.AnalysisPTQ```详解见[AnalysisPTQ.md](../../../docs/zh_cn/tutorials/quant/AnalysisPTQ.md)。
本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据,且使用简单、能快速得到量化模型,但往往会造成较大的精度损失。PaddleSlim提供量化分析工具,会使用接口```paddleslim.quant.AnalysisPTQ```,可视化展示出不适合量化的层,通过跳过这些层,提高离线量化模型精度。```paddleslim.quant.AnalysisPTQ```详解见[AnalysisPTQ.md](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/quant/post_training_quantization.md)。


经过多个实验,包括尝试多种激活算法(avg,KL等)、weight的量化方式(abs_max,channel_wise_abs_max),对PicoDet-s进行离线量化后精度均为0,以PicoDet-s为例,量化分析工具具体使用方法如下:
Expand Down Expand Up @@ -171,6 +188,139 @@ python post_quant.py --config_path=./configs/picodet_s_analyzed_ptq.yaml --save_
## 4.预测部署
预测部署可参考[Detection模型自动压缩示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression/detection)

量化模型可在GPU上可以使用TensorRT进行预测,在CPU上可以使用MKLDNN进行预测。

以下字段可用于配置预测参数:

| 参数名 | 含义 |
|:------:|:------:|
| model_path | inference 模型文件所在目录,该目录下需要有文件 model.pdmodel 和 model.pdiparams 两个文件 |
| reader_config | eval时模型reader的配置文件路径 |
| image_file | 如果只测试单张图片效果,直接根据image_file指定图片路径 |
| device | 使用GPU或者CPU预测,可选CPU/GPU |
| use_trt | 是否使用 TesorRT 预测引擎 |
| use_mkldnn | 是否启用```MKL-DNN```加速库,注意```use_mkldnn```与```use_gpu```同时为```True```时,将忽略```enable_mkldnn```,而使用```GPU```预测 |
| cpu_threads | CPU预测时,使用CPU线程数量,默认10 |
| precision | 预测精度,包括`fp32/fp16/int8` |
| include_nms | 是否包含nms,如果不包含nms,则设置False,如果包含nms,则设置为True |
| use_dynamic_shape | 是否使用动态shape,如果使用动态shape,则设置为True,否则设置为False |
| img_shape | 输入图片的大小。这里默认为640,意味着图像将被调整到640*640 |
| trt_calib_mode | 如果模型是通过TensorRT离线量化校准生成的,那么需要将此参数设置为True。|

-TesorRT预测示例:

yolov3_r50vd_dcn_270e_coco模型
```shell
python paddle_inference_eval.py \
--model_path=yolov3_r50vd_dcn_270e_coco \
--reader_config=configs/yolov3_r50vd_dcn.yml \
--use_trt=True \
--precision=fp32 \
--include_nms=True \
--benchmark=True
```
```shell
python paddle_inference_eval.py \
--model_path=yolov3_r50vd_dcn_270e_coco_ptq \
--reader_config=configs/yolov3_r50vd_dcn.yml \
--use_trt=True \
--precision=int8 \
--include_nms=True \
--benchmark=True
```
picodet_s模型
```shell
python paddle_inference_eval.py \
--model_path=picodet_s_416_coco_lcnet \
--reader_config=configs/picodet_reader.yml \
--use_trt=True \
--precision=fp16 \
--include_nms=True \
--benchmark=True
```
量化分析前
```shell
python paddle_inference_eval.py \
--model_path=picodet_s_ptq \
--reader_config=configs/picodet_reader.yml \
--use_trt=True \
--precision= \
--include_nms=True \
--benchmark=True
```
量化分析后
```shell
python paddle_inference_eval.py \
--model_path=picodet_s_analyzed_ptq_out \
--reader_config=configs/picodet_reader.yml \
--use_trt=True \
--precision=int8 \
--include_nms=True \
--benchmark=True
```
#### 4.1 C++部署
请参考[YOLOv3推理](https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/c%2B%2B/gpu/yolov3)

编译样例
- 文件yolov3_test.cc改成PicoDet-s.cc,为预测的样例程序(程序中的输入为固定值,如果您有opencv或其他方式进行数据读取的需求,需要对程序进行一定的修改)。
- 脚本compile.sh包含了第三方库、预编译库的信息配置。
- 脚本run.sh为一键运行脚本。
编译前,需要根据自己的环境修改compile.sh中的相关代码配置依赖库:

```shell
# 编译的 demo 名称
DEMO_NAME=picoDet-s

# 根据预编译库中的version.txt信息判断是否将以下三个标记打开
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=ON

# 配置预测库的根目录
LIB_DIR=${work_path}/../lib/paddle_inference

# 如果上述的WITH_GPU 或 USE_TENSORRT设为ON,请设置对应的CUDA, CUDNN, TENSORRT的路径。
CUDNN_LIB=/usr/lib/x86_64-linux-gnu/
CUDA_LIB=/usr/local/cuda/lib64
TENSORRT_ROOT=/usr/local/TensorRT-7.1.3.4
```
运行bash compile.sh编译样例

- 运行样例
使用原生GPU运行样例
```shell
./build/picodet-s --model_file picodet_s_416_coco_lenet/model.pdmodel --params_file picodet_s_416_coco_lenet/model.pdiparams
```
使用Trt FP32运行样例
```shell
./build/picodet-s --model_file picodet_s_416_coco_lenet/model.pdmodel --params_file picodet_s_416_coco_lenet/model.pdiparams --run_mode=trt_fp32
```
使用Trt FP16运行样例
```shell
./build/picodet-s --model_file picodet_s_416_coco_lenet/model.pdmodel --params_file picodet_s_416_coco_lenet/model.pdiparams --run_mode=trt_fp16
```
使用Trt Int8运行样例
在使用Trt Int8运行样例时,相同的运行命令需要执行两次。
生成量化校准表
```shell
./build/picodet-s --model_file picodet_s_416_coco_lcnet/model.pdmodel --params_file picodet_s_416_coco_lcnet/model.pdiparams --run_mode=trt_int8
```
加载校准表预测的log:
```shell
I0623 08:40:49.386909 107053 tensorrt_engine_op.h:159] This process is generating calibration table for Paddle TRT int8...
I0623 08:40:49.387279 107057 tensorrt_engine_op.h:352] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0623 08:41:13.784473 107053 analysis_predictor.cc:791] Wait for calib threads done.
I0623 08:41:14.419198 107053 analysis_predictor.cc:793] Generating TRT Calibration table data, this may cost a lot of time...
```
使用Trt dynamic shape运行样例(以Trt FP32为例)
```shell
./build/picodet-s --model_file picodet_s_416_coco_lcnet/model.pdmodel --params_file picodet_s_416_coco_lcnet/model.pdiparams --run_mode=trt_fp32 --use_dynamic_shape=1
```
| 模型 | trt-fp32 | trt-fp16 | trt-int8 | paddle_gpu fp32 | trt_fp32(dynamic_shape) |
|:------:|:------:|:------:|:------:| :------:| :------:|
| PicoDet-s | 3.05ms | 2.66ms | 2.40ms | 7.51ms | 2.82ms |
测速环境:Tesla T4,TensorRT 8.6.1,CUDA 11.6,batch_size=1,cudnn 8.4.0 Intel(R)Xeon(R)Gold 6271C CPU

## 5.FAQ

- 如果想对模型进行自动压缩,可进入[Detection模型自动压缩示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression/detection)中进行实验。
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
input_list: ['image', 'scale_factor']
model_dir: ./picodet_s_416_coco_lcnet/
model_dir: ./picodet_s_416_coco_lcnet
model_filename: model.pdmodel
params_filename: model.pdiparams
save_dir: ./analysis_results
Expand All @@ -26,11 +26,11 @@ EvalDataset:

# Small Dataset to accelerate analysis
# If not exist, delete the dict of FastEvalDataset
FastEvalDataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/small_instances_val2017.json
dataset_dir: /dataset/coco/
# FastEvalDataset:
# !COCODataSet
# image_dir: val2017
# anno_path: annotations/small_instances_val2017.json
# dataset_dir: /dataset/coco/


eval_height: &eval_height 416
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
input_list: ['image']
input_list: ['image','scale_factor']
arch: PPYOLOE # When export exclude_nms=True, need set arch: PPYOLOE
model_dir: ./ppyoloe_crn_s_300e_coco
model_filename: model.pdmodel
Expand Down Expand Up @@ -29,4 +29,4 @@ EvalReader:
- Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {}
batch_size: 32
batch_size: 16
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
input_list: ['image', 'scale_factor','im_shape']
model_dir: ./yolov3_r50vd_dcn_270e_coco
model_filename: model.pdmodel
params_filename: model.pdiparams
metric: COCO
num_classes: 80

# Datset configuration
TrainDataset:
!COCODataSet
image_dir: train2017
anno_path: annotations/instances_train2017.json
dataset_dir: /work/GETR-Lite-paddle-new/inference/datasets/coco/

EvalDataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: /work/GETR-Lite-paddle-new/inference/datasets/coco/

eval_height: &eval_height 608
eval_width: &eval_width
eval_size: &eval_size [*eval_height, *eval_width]

worker_num: 0

# preprocess reader in test
EvalReader:
inputs_def:
image_shape: [1, 3, *eval_height, *eval_width]
sample_transforms:
- Decode: {}
- Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_size: 4