** Description **
A brief description of what this project does and who it's for
All of the models (YOLOv5, YOLOv6, YOLOv7) were trained on the custom dataset which consisted of the original COCO 2017 dataset with additional annotations for faces created using YOLOv5-Face model.
For this purpose, modifications were made to the test_widerface.py
and the resultant
script coco_annotate_faces_custom.py
was saved in the corresponding github repo directory
on the local machine (../yolov5-face/
). Modifications mostly revolved around reading all
the annotations currently present in labels txt files and re-writing them back into txt files but combined with face annotations.
To reproduce this:
- Copy yolov5-face github repo to your local machine
- Follow environment set up instructions provided
- Download model weights you intend to use (maximum complexity model preferred)
Once done, the script could be ran as follows:
python coco_annotate_faces_custom.py --weights 'yolov5-face model'
--conf-thres {float} --device {either cuda device (i.e. 0 or 1,2,3) or cpu}
--augment --save_folder {Dir with existing txt annotations}
--dataset_folder {COCO dataset path}
--conf-thres 0.3 was used in our case
When training and evaluating the models, all of the standard procedured outlined in the corresponding github repositories were followed:
- YOLOv5 - Train Custom Model
- YOLOv6
- YOLOv7
nano
model version was chosen for each corresponding generation of YOLO family since it was intended for deployment on edge devices
NVIDIA TensorRT format allows for optimization for deep learning inference, delivering low latency and high throughput for inference application when deployed on embedded computing boards from Nvidia.
Not all of the YOLO repository contain a direct deployment script for TensorRT format conversion. Fortunately, all models could be temporarily converted to ONNX format following instructions in each corresponding repository. Afterwards, ONNX to TensorRT converter provided by Linaom1214 can be used for every model generation.
Converting to ONNX format:
In contrast, YOLOv7 repository already utilizes ONNX to TensorRT converter mentioned above and converts a model to the ONNX format all in one Colab Notebook.
TensorRT-For-YOLO-Seriers repo created by Linaom1214 includes Examples.ipynb Colab Notebook illustrating conversion for models of each YOLO family and should be followed with reference to the YOLOv7 example above.
- Install Real-Sense SDK 2.0 for your operating system
- Install Python Wrapper package by running
pip install pyrealsense2
There are Jupyter Notebooks for Intel RealSense SDK available in librealsense github repo that showcase the minimal implementation of camera alignment and depth estimation.
There is an obvious problem when estimating the distance to the object using only the center point of the bounding box. In case of the sudden point conversion failure (depth value is zero), three-dimensional coordinate transformation will make mistakes, assigning the distance value with huge discrepancy. In order to make the algorithm more robust to similar cases Media Filter was implemented. Several points with randomly initalized offset from the center of the bounding box are taken and the depth value for each of these points is estimated. These points are then sorted based on the depth value and median filter is used to select the limited number of points for final distance estimation, which is evaluated by averaging the distance of the few select points.
Norfair Python library was used to perform real-time 2D object tracking.
Tracker
class imported from the library was leveraged to update the positions of the object based
on their bounding boxes. A custom function convert_detections_to_norfair_detections
was coded to store YOLO detections with depth (and/or color) information
in Detection
structure. Moreover, standard functions draw_boxes
and draw_tracked_objects
were modified to create a stylistically pleasing output.
In order to carry out the color detection, pre-processed initial frame has to be converted to HSV format.
hsv_image = cv2.cvtColor(img.transpose((1,2,0)), cv2.COLOR_BGR2HSV)
*image has to transposed since the pre-processing step shifts the shape of an image, making number of color channels the first value in a tuple.
Custom det_cropped_bboxes_colors
function was applied at the inference step to:
- Crop a bounding box out of a frame
- Split resulting cropped image into h,s,v arrays
- Compute mean values of s and v channels using
np.mean()
- Compute average hue using mean of circular quantities
- Run calculated values through pre-defined thresholds to determine color name
The resulting algorithm to detect a color of a detected object was found to be computationally
sub-optimal, resulting in FPS loss. Thus, BaseEngine
class (initialized in utils_norfair.py
and from which Predictor
class inherits everything in trt_norfair.py
) was
modified to include a show_color_flag
. This flag can be used to enable the functionality of color detection upon request.
Activate conda environment
conda activate yolov6_deepsort
# deepsort is irrevelant to the currrent implementation
Go to the project directory
cd C:/Users/mukal/tensorrt-python/yolov6
Run the script
python trt_norfair.py