Skip to content

Files

FastRT

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Apr 19, 2021
Apr 12, 2021
Apr 19, 2021
Apr 19, 2021
Apr 19, 2021
Mar 10, 2021
Apr 19, 2021
Apr 12, 2021
Apr 19, 2021
Apr 19, 2021

C++ FastReID-TensorRT

Implementation of reid model with TensorRT network definition APIs to build the whole network.

So we don't use any parsers here.

How to Run

  1. Generate '.wts' file from pytorch with model_best.pth

    See How_to_Generate.md

  2. Config your model

    See Tensorrt Model Config

  3. (Optional) Build third party libs

    See Build third_party section

  4. Build fastrt execute file

    mkdir build
    cd build
    cmake -DBUILD_FASTRT_ENGINE=ON \
          -DBUILD_DEMO=ON \
          -DUSE_CNUMPY=ON ..
    make
    
  5. Run fastrt

    put model_best.wts into FastRT/

    ./demo/fastrt -s  // serialize model & save as 'xxx.engine' file
    
    ./demo/fastrt -d  // deserialize 'xxx.engine' file and run inference
    
  6. Verify the output with pytorch

  7. (Optional) Once you verify the result, you can set FP16 for speed up

    mkdir build
    cd build
    cmake -DBUILD_FASTRT_ENGINE=ON \
          -DBUILD_DEMO=ON \
          -DBUILD_FP16=ON ..
    make
    

    then go to step 5

  8. (Optional) You can use INT8 quantization for speed up

    prepare CALIBRATE DATASET and set the path via cmake. (The path must end with /)

    mkdir build
    cd build
    cmake -DBUILD_FASTRT_ENGINE=ON \
          -DBUILD_DEMO=ON \
          -DBUILD_INT8=ON \
          -DINT8_CALIBRATE_DATASET_PATH="/data/Market-1501-v15.09.15/bounding_box_test/" ..
    make
    

    then go to step 5

  9. (Optional) Build tensorrt model as shared libs

    mkdir build
    cd build
    cmake -DBUILD_FASTRT_ENGINE=ON \
          -DBUILD_DEMO=OFF \
          -DBUILD_FP16=ON ..
    make
    make install
    

    You should find libs in FastRT/libs/FastRTEngine/

    Now build your application execute file

    cmake -DBUILD_FASTRT_ENGINE=OFF -DBUILD_DEMO=ON ..
    make
    

    then go to step 5

  10. (Optional) Build tensorrt model with python interface, then you can use FastRT model in python.

    mkdir build
    cd build
    cmake -DBUILD_FASTRT_ENGINE=ON \
        -DBUILD_DEMO=ON \
        -DBUILD_PYTHON_INTERFACE=ON ..
    make
    

    You should get a so file FastRT/build/pybind_interface/ReID.cpython-37m-x86_64-linux-gnu.so.

    Then go to step 5 to create engine file.

    After that you can import this so file in python, and deserialize engine file to infer in python.

    You can find use example in pybind_interface/test.py and pybind_interface/market_benchmark.py.

    from PATH_TO_SO_FILE import ReID
    model = ReID(GPU_ID)
    model.build(PATH_TO_YOUR_ENGINEFILE)
    numpy_feature = np.array([model.infer(CV2_FRAME)])
    
    • pybind_interface/test.py use pybind_interface/docker/trt7cu100/Dockerfile (without pytorch installed)
    • pybind_interface/market_benchmark.py use pybind_interface/docker/trt7cu102_torch160/Dockerfile (with pytorch installed)

Tensorrt Model Config

Edit FastRT/demo/inference.cpp, according to your model config

The config is related to How_to_Generate.md

  • Ex1. sbs_R50-ibn
static const std::string WEIGHTS_PATH = "../sbs_R50-ibn.wts"; 
static const std::string ENGINE_PATH = "./sbs_R50-ibn.engine";

static const int MAX_BATCH_SIZE = 4;
static const int INPUT_H = 384;
static const int INPUT_W = 128;
static const int OUTPUT_SIZE = 2048;
static const int DEVICE_ID = 0;

static const FastreidBackboneType BACKBONE = FastreidBackboneType::r50; 
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
static const int LAST_STRIDE = 1;
static const bool WITH_IBNA = true; 
static const bool WITH_NL = true;
static const int EMBEDDING_DIM = 0; 
  • Ex2. sbs_R50
static const std::string WEIGHTS_PATH = "../sbs_R50.wts";
static const std::string ENGINE_PATH = "./sbs_R50.engine"; 

static const int MAX_BATCH_SIZE = 4;
static const int INPUT_H = 384;
static const int INPUT_W = 128;
static const int OUTPUT_SIZE = 2048;
static const int DEVICE_ID = 0;

static const FastreidBackboneType BACKBONE = FastreidBackboneType::r50; 
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
static const int LAST_STRIDE = 1;
static const bool WITH_IBNA = false; 
static const bool WITH_NL = true;
static const int EMBEDDING_DIM = 0; 
  • Ex3. sbs_r34_distill
static const std::string WEIGHTS_PATH = "../sbs_r34_distill.wts"; 
static const std::string ENGINE_PATH = "./sbs_r34_distill.engine";

static const int MAX_BATCH_SIZE = 4;
static const int INPUT_H = 384;
static const int INPUT_W = 128;
static const int OUTPUT_SIZE = 512;
static const int DEVICE_ID = 0;

static const FastreidBackboneType BACKBONE = FastreidBackboneType::r34_distill; 
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
static const int LAST_STRIDE = 1;
static const bool WITH_IBNA = false; 
static const bool WITH_NL = false;
static const int EMBEDDING_DIM = 0; 
  • Ex4.kd-r34-r101_ibn
static const std::string WEIGHTS_PATH = "../kd_r34_distill.wts"; 
static const std::string ENGINE_PATH = "./kd_r34_distill.engine"; 

static const int MAX_BATCH_SIZE = 4;
static const int INPUT_H = 384;
static const int INPUT_W = 128;
static const int OUTPUT_SIZE = 512;
static const int DEVICE_ID = 0;

static const FastreidBackboneType BACKBONE = FastreidBackboneType::r34_distill; 
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
static const int LAST_STRIDE = 1;
static const bool WITH_IBNA = false; 
static const bool WITH_NL = false;
static const int EMBEDDING_DIM = 0; 
  • Ex5.kd-r18-r101_ibn
static const std::string WEIGHTS_PATH = "../kd-r18-r101_ibn.wts"; 
static const std::string ENGINE_PATH = "./kd_r18_distill.engine"; 

static const int MAX_BATCH_SIZE = 16;
static const int INPUT_H = 384;
static const int INPUT_W = 128;
static const int OUTPUT_SIZE = 512;
static const int DEVICE_ID = 1;

static const FastreidBackboneType BACKBONE = FastreidBackboneType::r18_distill; 
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
static const int LAST_STRIDE = 1;
static const bool WITH_IBNA = true; 
static const bool WITH_NL = false;
static const int EMBEDDING_DIM = 0; 

Supported conversion

  • Backbone: resnet50, resnet34, distill-resnet50, distill-resnet34, distill-resnet18
  • Heads: embedding_head
  • Plugin layers: ibn, non-local
  • Pooling layers: maxpool, avgpool, GeneralizedMeanPooling, GeneralizedMeanPoolingP

Benchmark

Model Engine Batch size Image size Embd Time
Vanilla R34 Python/Pytorch1.6 fp32 1 256x128 512 6.49ms
Vanilla R34 Python/Pytorch1.6 fp32 4 256x128 512 7.16ms
Vanilla R34 C++/trt7 fp32 1 256x128 512 2.34ms
Vanilla R34 C++/trt7 fp32 4 256x128 512 3.99ms
Vanilla R34 C++/trt7 fp16 1 256x128 512 1.83ms
Vanilla R34 C++/trt7 fp16 4 256x128 512 2.38ms
Distill R34 Python/Pytorch1.6 fp32 1 256x128 512 5.68ms
Distill R34 Python/Pytorch1.6 fp32 4 256x128 512 6.26ms
Distill R34 C++/trt7 fp32 1 256x128 512 2.36ms
Distill R34 C++/trt7 fp32 4 256x128 512 4.05ms
Distill R34 C++/trt7 fp16 1 256x128 512 1.86ms
Distill R34 C++/trt7 fp16 4 256x128 512 2.68ms
R50-NL-IBN Python/Pytorch1.6 fp32 1 256x128 2048 14.86ms
R50-NL-IBN Python/Pytorch1.6 fp32 4 256x128 2048 15.14ms
R50-NL-IBN C++/trt7 fp32 1 256x128 2048 4.67ms
R50-NL-IBN C++/trt7 fp32 4 256x128 2048 6.15ms
R50-NL-IBN C++/trt7 fp16 1 256x128 2048 2.87ms
R50-NL-IBN C++/trt7 fp16 4 256x128 2048 3.81ms
  • Time: preprocessing(normalization) + inference (100 times average)
  • GPU: GTX 2080 TI

Test Environment

  1. fastreid v1.0.0 / 2080TI / Ubuntu18.04 / Nvidia driver 435 / cuda10.0 / cudnn7.6.5 / trt7.0.0 / nvinfer7.0.0 / opencv3.2

  2. fastreid v1.0.0 / 2080TI / Ubuntu18.04 / Nvidia driver 450 / cuda10.2 / cudnn7.6.5 / trt7.0.0 / nvinfer7.0.0 / opencv3.2

Installation

  • Set up with Docker

    for cuda10.0

    cd docker/trt7cu100
    sudo docker build -t trt7:cuda100 .
    sudo docker run --gpus all -it --name fastrt -v /home/YOURID/workspace:/workspace -d trt7:cuda100
    // then put the repo into `/home/YOURID/workspace/` before you getin container
    

    for cuda10.2

    cd docker/trt7cu102
    sudo docker build -t trt7:cuda102 .
    sudo docker run --gpus all -it --name fastrt -v /home/YOURID/workspace:/workspace -d trt7:cuda102 
    // then put the repo into `/home/YOURID/workspace/` before you getin container
    
  • Installation reference

Build third party

  • for read/write numpy

    cd third_party/cnpy
    cmake -DCMAKE_INSTALL_PREFIX=../../libs/cnpy -DENABLE_STATIC=OFF . && make -j4 && make install