# retinaface **Repository Path**: luo_zhi_cheng/retinaface ## Basic Information - **Project Name**: retinaface - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-02-26 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # RetinaFace C++ Reimplement ## source Reference resources [RetinaFace](https://github.com/deepinsight/insightface/tree/master/RetinaFace) in insightface with python code. ## model transformation tool [MXNet2Caffe](https://github.com/cypw/MXNet2Caffe) you need to add some layers yourself, and in caffe there is not **upsample**,you can replace with **deconvolution**,and maybe slight accuracy loss. the origin model reference from [mobilenet25](https://pan.baidu.com/s/1P1ypO7VYUbNAezdvLm2m9w#list/path=%2F),and I have retrain it. ## Demo ``` $ mkdir build $ cd build/ $ cmake ../ $ make ``` you need to modify dependency path in CmakeList file. ## Speed test hardware:1080Ti test1: | model | speed | input size | preprocess time | inference | postprocess time | | :------: | :----: | :--------: | :-------------: | :-------: | :--------------: | | mxnet | 44.8ms | 1280x896 | 19.0ms | 8.0ms | 16.0ms | | caffe | 46.9ms | 1280x896 | 5.8ms | 24.1ms | 16.0ms | | tensorrt | 29.3ms | 1280x896 | 6.9ms | 5.4ms | 15.0ms | test2: | model | speed | inputsize | preprocess time | inference | postprocess time | | :------: | :----: | :-------: | :-------------: | :-------: | :--------------: | | mxnet | 6.4ms | 320x416 | 1.3ms | 0.1ms | 4.2ms | | caffe | 30.8ms | 320x416 | 1.2ms | 27ms | 2.3ms | | tensorrt | 4.7ms | 320x416 | 0.7ms | 1.9ms | 1.8ms | tensorrt batch test: | batchsize | inputsize | maxbatchsize | preprocess time | inference | postprocess time | all | GPU | | :-------: | :-------: | :----------: | :-------------: | :-------: | :--------------: | :-----: | :--: | | 1 | 448x448 | 8 | 1.0ms | 2.3ms | 2.6ms | 6.7ms | 35% | | 2 | 448x448 | 8 | 2.5ms | 3.3ms | 5.2ms | 11.8ms | 33% | | 4 | 448x448 | 8 | 4.1ms | 4.6ms | 10.0ms | 21.8ms | 28% | | 8 | 448x448 | 8 | 8.7ms | 7.0ms | 20.3ms | 40.7ms | 23% | | 16 | 448x448 | 32 | 28.1 | 14.7 | 38.7ms | 92.0ms | - | | 32 | 448x448 | 32 | 36.2ms | 26.3 | 75.7ms | 163.5ms | - | note: batch size have some advantage in inference but can't speed up preprocess and postprocess. optimize post process: | batchsize | inputsize | maxbatchsize | preprocess time | inference | postprocess time | all | GPU | | :-------: | :-------: | :----------: | :-------------: | :-------: | :--------------: | :----: | :--: | | 1 | 448x448 | 8 | 1.0ms | 2.3ms | 0.09ms | 3.5ms | 70% | | 2 | 448x448 | 8 | 2.2ms | 2.8ms | 0.2ms | 5.3ms | 60% | | 4 | 448x448 | 8 | 3.7ms | 5.0ms | 0.3ms | 8.4ms | 55% | | 8 | 448x448 | 8 | 7.5ms | 6.5ms | 0.67ms | 14.9ms | 50% | | 16 | 448x448 | 32 | 26ms | 13ms | 1.3ms | 41ms | 40% | | 32 | 448x448 | 32 | 32ms | 22ms | 2.7ms | 56.6ms | 50% | use nvidia npp library to speed up preprocess: | batchsize | inputsize | maxbatchsize | preprocess time | inference | postprocess time | all | GPU | | :-------: | :-------: | :----------: | :-------------: | :-------: | :--------------: | :----: | :--: | | 1 | 448x448 | 8 | 0.2ms | 2.3ms | 0.1ms | 2.6ms | 91% | | 2 | 448x448 | 8 | 0.3ms | 3.0ms | 0.2ms | 3.5ms | 85% | | 4 | 448x448 | 8 | 0.5ms | 4.1ms | 0.32ms | 5.0ms | 82% | | 8 | 448x448 | 8 | 1.2ms | 6.3ms | 0.77ms | 8.3ms | 79% | | 16 | 448x448 | 32 | 2.2ms | 14ms | 1.3ms | 16.7ms | 80% | | 32 | 448x448 | 32 | 5.0ms | 22ms | 2.8ms | 29.3ms | 77% | ### INT8 inference INT8 calibration table can generate by [INT8-Calibration-Tool](https://github.com/clancylian/retinaface/tree/master/INT8-Calibration-Tool). ### Accuracy ![https://raw.githubusercontent.com/clancylian/retinaface/master/data/retinaface-widerface%E6%B5%8B%E8%AF%95.png](https://raw.githubusercontent.com/clancylian/retinaface/master/data/retinaface-widerface%E6%B5%8B%E8%AF%95.png)