# simple_pose

**Repository Path**: tx_ma/simple_pose

## Basic Information

- **Project Name**: simple_pose
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-12-27
- **Last Updated**: 2021-12-27

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# SimplePose
该项目主要包含一系列Top-Down的2D姿态估计算法,如[AlphaPose](https://github.com/MVIG-SJTU/AlphaPose) 中的DUC,[Simple Baselines](https://arxiv.org/abs/1804.06208) 中DCONV,
以及[HRNet-Human-Pose-Estimation](https://github.com/HRNet/HRNet-Human-Pose-Estimation) HRNet.同时也包含了[DarkPose](https://github.com/ilovepose/DarkPose) 所提到的KeyPoints Encoder与Decoder的
一些小的tricks(大约获得1.0左右的mAP增益).


## requirement
```text
tqdm
pyyaml
numpy
opencv-python
pycocotools
torch >= 1.5
torchvision >=0.6.0
```
## result
该项目以192x256的图片(包含人体的图片patch)作为输入,使用了4块显卡进行训练,batch_size=128(32/卡,显存17892MB,约17GB).总epoch=180,
初始学习率为0.001,使用了Adam作为优化器.学习率在第120个epoch与第160个epoch进行衰减,衰减系数0.1,训练时长约为21h(小时).测试所使用的heat_map到
keypoints的decode为具体解码器为 GaussTaylorKeyPointDecoder

**以下结果均使用[COCO_val2017_detections_AP_H_56_person.json](https://drive.google.com/drive/folders/1fRUDNUDxe9fjqcRZ2bnF_TKMlO0nB_dk?usp=sharing) 的检测结果作为测试,数据来自
[HRNet-Human-Pose-Estimation](https://github.com/HRNet/HRNet-Human-Pose-Estimation)**

### DConv(上采样使用转置卷积)的performance
```shell script
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.701
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.883
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.772
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.665
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.772
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.760
Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.928
Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.825
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.715
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.825
```
### DUC(上采样Conv+PixelShuffle)的performance)
```shell script
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.709
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.885
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.781
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.674
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.781
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.768
Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.929
Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.832
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.722
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.833
```
### SELayer+DUC(上采样Conv+PixelShuffle)的performance
**模型初始化时的reduction为True**
```shell script
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.718
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.892
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.790
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.683
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.787
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.775
Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.932
Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.841
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.732
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.836
```

### SELayer+DConv(上采样使用转置卷积)的performance
**模型初始化时的reduction为True**
```shell script
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.717
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.890
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.791
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.685
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.785
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.776
Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.934
Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.841
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.733
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.837
```
### HRNet W32 performance
```shell script
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.741
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.895
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.807
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.703
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.814
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.795
Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.935
Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.856
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.750
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.860
```

### SE_DUC+YOLOv5(目标检测使用重写的YOLOv5)
```shell script
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.723
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.903
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.794
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.689
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.787
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.780
Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.940
Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.845
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.739
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.839
```
[目标检测权重以及姿态估计的权重](https://pan.baidu.com/s/1O4u1wOklZOj-OVYivpRX1w) 云盘密码:e5f9


## training
目前支持coco 关键点数据集.自定义数据集请参考datasets/coco.py中的MSCOCO.__load_in()部分代码.相信这部分代码非常容易改写.

### COCO
* modify main.py (modify config file path)
```python
from processors.dp_pose_resnet_solver import DPProcessor

if __name__ == '__main__':
    ddp_processor = DPProcessor(cfg_path="configs/dp_fast_pose.yaml")
    ddp_processor.run()

```
* custom some parameters in *config.yaml*
```yaml
model_name: fast_pose_dp
data:
  train_ann_path: .../data/annotations/person_keypoints_train2017.json
  val_ann_path: .../data/annotations/person_keypoints_val2017.json
  train_img_root: .../data/train2017
  val_img_root: .../data/val2017
  batch_size: 128
  num_workers: 8
  debug: False

model:
  type: pose_resnet_duc
  name: resnet50
  num_joints: 17
  pretrained: True
  reduction: True

optim:
  lr: 0.001
  amp: False
  milestones: [120,160]
  epochs: 180
  gamma: 0.1
val:
  interval: 1
  weight_path: weights

gpus: [4,5,6,7]

```
* run train scripts
```shell script
nohup python main.py >>train.log 2>&1 &
```

## TODO
- [x] Color Jitter
- [x] Perspective Transform
- [x] Warming UP
- [x] Cosine Lr Decay
- [x] EMA(Exponential Moving Average)[**非常不建议使用,训练过程中会有比较大程度的震荡**]
- [x] Mixed Precision Training (supported by apex)
- [x] SELayer
- [x] Sync Batch Normalize
- [x] Person Detector support(by YOLOv5)
- [x] Test With Person Detector(YOLOv3/v4/v5...)
- [ ] custom data train\test scripts
- [ ] video demo\friendly API