English | 简体中文
Paddle Serving aims to help deep learning developers easily deploy online prediction services, support one-click deployment of industrial-grade service capabilities, high concurrency between client and server Efficient communication and support for developing clients in multiple programming languages.
This section takes the HTTP prediction service deployment as an example to introduce how to use PaddleServing to deploy the model service in PaddleVideo. Currently, only Linux platform deployment is supported, and Windows platform is not currently supported.
The Serving official website recommends using docker to install and deploy the Serving environment. First, you need to pull the docker environment and create a Serving-based docker.
# start GPU docker
docker pull paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel
nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel bash
nvidia-docker exec -it test bash
# start CPU docker
docker pull paddlepaddle/serving:0.7.0-devel
docker run -p 9292:9292 --name test -dit paddlepaddle/serving:0.7.0-devel bash
docker exec -it test bash
After entering docker, you need to install Serving-related python packages.
python3.7 -m pip install paddle-serving-client==0.7.0
python3.7 -m pip install paddle-serving-app==0.7.0
#If it is a CPU deployment environment:
python3.7 -m pip install paddle-serving-server==0.7.0 #CPU
python3.7 -m pip install paddlepaddle==2.2.0 # CPU
#If it is a GPU deployment environment
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post102 # GPU with CUDA10.2 + TensorRT6
python3.7 -m pip install paddlepaddle-gpu==2.2.0 # GPU with CUDA10.2
#Other GPU environments need to confirm the environment and then choose which one to execute
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post101 # GPU with CUDA10.1 + TensorRT6
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUDA11.2 + TensorRT8
-
If the installation speed is too slow, you can change the source through
-i https://pypi.tuna.tsinghua.edu.cn/simple
to speed up the installation process. -
For more environment and corresponding installation packages, see: https://github.com/PaddlePaddle/Serving/blob/v0.9.0/doc/Install_Linux_Env_CN.md
When using PaddleServing for service deployment, you need to convert the saved inference model into a Serving model. The following uses the PP-TSM model as an example to introduce how to deploy the action recognition service.
-
Download PP-TSM inference model and convert to Serving model:
# Enter PaddleVideo directory cd PaddleVideo # Download the inference model and extract it to ./inference mkdir ./inference pushd ./inference wget https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM.zip unzip ppTSM.zip popd # Convert to Serving model pushd deploy/cpp_serving python3.7 -m paddle_serving_client.convert \ --dirname ../../inference/ppTSM \ --model_filename ppTSM.pdmodel \ --params_filename ppTSM.pdiparams \ --serving_server ./ppTSM_serving_server \ --serving_client ./ppTSM_serving_client popd
| parameter | type | default value | description | | ----------------- | ---- | ------------------ | ------- -------------------------------------------------- --- | |
dirname
| str | - | The storage path of the model file to be converted. The program structure file and parameter file are saved in this directory. | |model_filename
| str | None | The name of the file storing the model Inference Program structure that needs to be converted. If set to None, use__model__
as the default filename | |params_filename
| str | None | File name where all parameters of the model to be converted are stored. It needs to be specified if and only if all model parameters are stored in a single binary file. If the model parameters are stored in separate files, set it to None | |serving_server
| str |"serving_server"
| The storage path of the converted model files and configuration files. Default is serving_server | |serving_client
| str |"serving_client"
| The converted client configuration file storage path. Default is serving_client | -
After the inference model conversion is completed, two folders,
ppTSM_serving_client
andppTSM_serving_server
will be generated under thedeploy/cpp_serving
folder, with the following formats:PaddleVideo/deploy/cpp_serving ├── ppTSM_serving_client │ ├── serving_client_conf.prototxt │ └── serving_client_conf.stream.prototxt └── ppTSM_serving_server ├── ppTSM.pdiparams ├── ppTSM.pdmodel ├── serving_server_conf.prototxt └── serving_server_conf.stream.prototxt
After getting the model file, you need to modify
serving_client_conf.prototxt
underppTSM_serving_client
andserving_server_conf.prototxt
underppTSM_serving_server
respectively, and changealias_name
underfetch_var
in both files tooutputs
Remarks: In order to be compatible with the deployment of different models, Serving provides the function of input and output renaming. In this way, when different models are inferred and deployed, they only need to modify the
alias_name
of the configuration file, and the inference deployment can be completed without modifying the code. The modifiedserving_server_conf.prototxt
looks like this:feed_var { name: "data_batch_0" alias_name: "data_batch_0" is_lod_tensor: false feed_type: 1 shape: 8 shape: 3 shape: 224 shape: 224 } fetch_var { name: "linear_2.tmp_1" alias_name: "outputs" is_lod_tensor: false fetch_type: 1 shape: 400 }
The cpp_serving
directory contains the code for starting the pipeline service, the C++ serving service and sending the prediction request, including:
run_cpp_serving.sh # Start the script on the C++ serving server side
pipeline_http_client.py # The script on the client side to send data and get the prediction results
paddle_env_install.sh # Install C++ serving environment script
preprocess_ops.py # file to store preprocessing functions
-
Go to the working directory:
cd deploy/cpp_serving
-
Start the service:
# Start in the background, the logs printed during the process will be redirected and saved to nohup.txt bash run_cpp_serving.sh
-
Send the request and get the result:
python3.7 serving_client.py \
-n PPTSM \
-c ./ppTSM_serving_client/serving_client_conf.prototxt \
--input_file=../../data/example.avi
After a successful run, the results of the model prediction will be printed in the cmd window, and the results are as follows:
I0510 04:33:00.110025 37097 naming_service_thread.cpp:202] brpc::policy::ListNamingService("127.0.0.1:9993"): added 1
I0510 04:33:01.904764 37097 general_model.cpp:490] [client]logid=0,client_cost=1640.96ms,server_cost=1623.21ms.
{'class_id': '[5]', 'prob': '[0.9907387495040894]'}
If an error is reported during the process and it shows that libnvinfer.so.6 cannot be found, you can execute the script paddle_env_install.sh
to install the relevant environment
bash paddle_env_install.sh
Q1: No result is returned after the request is sent or an output decoding error is prompted
A1: Do not set the proxy when starting the service and sending the request. You can close the proxy before starting the service and sending the request. The command to close the proxy is:
unset https_proxy
unset http_proxy