Name	Name	Last commit message	Last commit date
parent directory ..
demo	demo	[v005] set INT8 calibrate set via cmake (#459 )	Apr 19, 2021
docker	docker	update dockerfile in fastrt (#437 )	Apr 12, 2021
fastrt	fastrt	[v005] set INT8 calibrate set via cmake (#459 )	Apr 19, 2021
include/fastrt	include/fastrt	[v005] set INT8 calibrate set via cmake (#459 )	Apr 19, 2021
pybind_interface	pybind_interface	[v005] set INT8 calibrate set via cmake (#459 )	Apr 19, 2021
third_party/cnpy	third_party/cnpy	fastrt patch update	Mar 10, 2021
tools	tools	unify gen_wts.py and inference.cpp dummy test value. (#457 )	Apr 19, 2021
.gitignore	.gitignore	Add python interface by pybind11 and Int8 mode	Apr 12, 2021
CMakeLists.txt	CMakeLists.txt	[v005] set INT8 calibrate set via cmake (#459 )	Apr 19, 2021
README.md	README.md	[v005] set INT8 calibrate set via cmake (#459 )	Apr 19, 2021

C++ FastReID-TensorRT

Implementation of reid model with TensorRT network definition APIs to build the whole network.

So we don't use any parsers here.

How to Run

Generate '.wts' file from pytorch with model_best.pth

See How_to_Generate.md
Config your model

See Tensorrt Model Config
(Optional) Build third party libs

See Build third_party section

Build fastrt execute file

mkdir build
cd build
cmake -DBUILD_FASTRT_ENGINE=ON \
      -DBUILD_DEMO=ON \
      -DUSE_CNUMPY=ON ..
make

Run fastrt

put model_best.wts into FastRT/

./demo/fastrt -s  // serialize model & save as 'xxx.engine' file

./demo/fastrt -d  // deserialize 'xxx.engine' file and run inference

Verify the output with pytorch

(Optional) Once you verify the result, you can set FP16 for speed up

mkdir build
cd build
cmake -DBUILD_FASTRT_ENGINE=ON \
      -DBUILD_DEMO=ON \
      -DBUILD_FP16=ON ..
make

then go to step 5

(Optional) You can use INT8 quantization for speed up

prepare CALIBRATE DATASET and set the path via cmake. (The path must end with /)

mkdir build
cd build
cmake -DBUILD_FASTRT_ENGINE=ON \
      -DBUILD_DEMO=ON \
      -DBUILD_INT8=ON \
      -DINT8_CALIBRATE_DATASET_PATH="/data/Market-1501-v15.09.15/bounding_box_test/" ..
make

then go to step 5

(Optional) Build tensorrt model as shared libs

mkdir build
cd build
cmake -DBUILD_FASTRT_ENGINE=ON \
      -DBUILD_DEMO=OFF \
      -DBUILD_FP16=ON ..
make
make install

You should find libs in FastRT/libs/FastRTEngine/

Now build your application execute file

cmake -DBUILD_FASTRT_ENGINE=OFF -DBUILD_DEMO=ON ..
make

then go to step 5

(Optional) Build tensorrt model with python interface, then you can use FastRT model in python.
```
mkdir build
cd build
cmake -DBUILD_FASTRT_ENGINE=ON \
    -DBUILD_DEMO=ON \
    -DBUILD_PYTHON_INTERFACE=ON ..
make
```
You should get a so file FastRT/build/pybind_interface/ReID.cpython-37m-x86_64-linux-gnu.so.

Then go to step 5 to create engine file.

After that you can import this so file in python, and deserialize engine file to infer in python.

You can find use example in pybind_interface/test.py and pybind_interface/market_benchmark.py.
```
from PATH_TO_SO_FILE import ReID
model = ReID(GPU_ID)
model.build(PATH_TO_YOUR_ENGINEFILE)
numpy_feature = np.array([model.infer(CV2_FRAME)])
```
- pybind_interface/test.py use pybind_interface/docker/trt7cu100/Dockerfile (without pytorch installed)
- pybind_interface/market_benchmark.py use pybind_interface/docker/trt7cu102_torch160/Dockerfile (with pytorch installed)

`Tensorrt Model Config`

Edit FastRT/demo/inference.cpp, according to your model config

The config is related to How_to_Generate.md

Ex1. sbs_R50-ibn

static const std::string WEIGHTS_PATH = "../sbs_R50-ibn.wts"; 
static const std::string ENGINE_PATH = "./sbs_R50-ibn.engine";

static const int MAX_BATCH_SIZE = 4;
static const int INPUT_H = 384;
static const int INPUT_W = 128;
static const int OUTPUT_SIZE = 2048;
static const int DEVICE_ID = 0;

static const FastreidBackboneType BACKBONE = FastreidBackboneType::r50; 
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
static const int LAST_STRIDE = 1;
static const bool WITH_IBNA = true; 
static const bool WITH_NL = true;
static const int EMBEDDING_DIM = 0;

Ex2. sbs_R50

static const std::string WEIGHTS_PATH = "../sbs_R50.wts";
static const std::string ENGINE_PATH = "./sbs_R50.engine"; 

static const int MAX_BATCH_SIZE = 4;
static const int INPUT_H = 384;
static const int INPUT_W = 128;
static const int OUTPUT_SIZE = 2048;
static const int DEVICE_ID = 0;

static const FastreidBackboneType BACKBONE = FastreidBackboneType::r50; 
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
static const int LAST_STRIDE = 1;
static const bool WITH_IBNA = false; 
static const bool WITH_NL = true;
static const int EMBEDDING_DIM = 0;

Ex3. sbs_r34_distill

static const std::string WEIGHTS_PATH = "../sbs_r34_distill.wts"; 
static const std::string ENGINE_PATH = "./sbs_r34_distill.engine";

static const int MAX_BATCH_SIZE = 4;
static const int INPUT_H = 384;
static const int INPUT_W = 128;
static const int OUTPUT_SIZE = 512;
static const int DEVICE_ID = 0;

static const FastreidBackboneType BACKBONE = FastreidBackboneType::r34_distill; 
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
static const int LAST_STRIDE = 1;
static const bool WITH_IBNA = false; 
static const bool WITH_NL = false;
static const int EMBEDDING_DIM = 0;

Ex4.kd-r34-r101_ibn

static const std::string WEIGHTS_PATH = "../kd_r34_distill.wts"; 
static const std::string ENGINE_PATH = "./kd_r34_distill.engine"; 

static const int MAX_BATCH_SIZE = 4;
static const int INPUT_H = 384;
static const int INPUT_W = 128;
static const int OUTPUT_SIZE = 512;
static const int DEVICE_ID = 0;

static const FastreidBackboneType BACKBONE = FastreidBackboneType::r34_distill; 
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
static const int LAST_STRIDE = 1;
static const bool WITH_IBNA = false; 
static const bool WITH_NL = false;
static const int EMBEDDING_DIM = 0;

Ex5.kd-r18-r101_ibn

static const std::string WEIGHTS_PATH = "../kd-r18-r101_ibn.wts"; 
static const std::string ENGINE_PATH = "./kd_r18_distill.engine"; 

static const int MAX_BATCH_SIZE = 16;
static const int INPUT_H = 384;
static const int INPUT_W = 128;
static const int OUTPUT_SIZE = 512;
static const int DEVICE_ID = 1;

static const FastreidBackboneType BACKBONE = FastreidBackboneType::r18_distill; 
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
static const int LAST_STRIDE = 1;
static const bool WITH_IBNA = true; 
static const bool WITH_NL = false;
static const int EMBEDDING_DIM = 0;

Supported conversion

Backbone: resnet50, resnet34, distill-resnet50, distill-resnet34, distill-resnet18
Heads: embedding_head
Plugin layers: ibn, non-local
Pooling layers: maxpool, avgpool, GeneralizedMeanPooling, GeneralizedMeanPoolingP

Benchmark

Model	Engine	Batch size	Image size	Embd	Time
Vanilla R34	Python/Pytorch1.6 fp32	1	256x128	512	6.49ms
Vanilla R34	Python/Pytorch1.6 fp32	4	256x128	512	7.16ms
Vanilla R34	C++/trt7 fp32	1	256x128	512	2.34ms
Vanilla R34	C++/trt7 fp32	4	256x128	512	3.99ms
Vanilla R34	C++/trt7 fp16	1	256x128	512	1.83ms
Vanilla R34	C++/trt7 fp16	4	256x128	512	2.38ms
Distill R34	Python/Pytorch1.6 fp32	1	256x128	512	5.68ms
Distill R34	Python/Pytorch1.6 fp32	4	256x128	512	6.26ms
Distill R34	C++/trt7 fp32	1	256x128	512	2.36ms
Distill R34	C++/trt7 fp32	4	256x128	512	4.05ms
Distill R34	C++/trt7 fp16	1	256x128	512	1.86ms
Distill R34	C++/trt7 fp16	4	256x128	512	2.68ms
R50-NL-IBN	Python/Pytorch1.6 fp32	1	256x128	2048	14.86ms
R50-NL-IBN	Python/Pytorch1.6 fp32	4	256x128	2048	15.14ms
R50-NL-IBN	C++/trt7 fp32	1	256x128	2048	4.67ms
R50-NL-IBN	C++/trt7 fp32	4	256x128	2048	6.15ms
R50-NL-IBN	C++/trt7 fp16	1	256x128	2048	2.87ms
R50-NL-IBN	C++/trt7 fp16	4	256x128	2048	3.81ms

Time: preprocessing(normalization) + inference (100 times average)
GPU: GTX 2080 TI

Test Environment

fastreid v1.0.0 / 2080TI / Ubuntu18.04 / Nvidia driver 435 / cuda10.0 / cudnn7.6.5 / trt7.0.0 / nvinfer7.0.0 / opencv3.2
fastreid v1.0.0 / 2080TI / Ubuntu18.04 / Nvidia driver 450 / cuda10.2 / cudnn7.6.5 / trt7.0.0 / nvinfer7.0.0 / opencv3.2

Installation

Set up with Docker

for cuda10.0

cd docker/trt7cu100
sudo docker build -t trt7:cuda100 .
sudo docker run --gpus all -it --name fastrt -v /home/YOURID/workspace:/workspace -d trt7:cuda100
// then put the repo into `/home/YOURID/workspace/` before you getin container

for cuda10.2

cd docker/trt7cu102
sudo docker build -t trt7:cuda102 .
sudo docker run --gpus all -it --name fastrt -v /home/YOURID/workspace:/workspace -d trt7:cuda102 
// then put the repo into `/home/YOURID/workspace/` before you getin container

Installation reference

Build third party

for read/write numpy

cd third_party/cnpy
cmake -DCMAKE_INSTALL_PREFIX=../../libs/cnpy -DENABLE_STATIC=OFF . && make -j4 && make install

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

FastRT

FastRT

README.md

C++ FastReID-TensorRT

How to Run

`Tensorrt Model Config`

Supported conversion

Benchmark

Test Environment

Installation

Build third party

Files

FastRT

Directory actions

More options

Directory actions

More options

Latest commit

History

FastRT

Folders and files

parent directory

README.md

C++ FastReID-TensorRT

How to Run

Tensorrt Model Config

Supported conversion

Benchmark

Test Environment

Installation

Build third party

`Tensorrt Model Config`