# esod **Repository Path**: mirrors_alibaba/esod ## Basic Information - **Project Name**: esod - **Description**: [TIP 24] The offical implementation of Efficient Small Object Detection on High-Resolution Images - **Primary Language**: Unknown - **License**: GPL-3.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-12-12 - **Last Updated**: 2025-09-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ESOD: Efficient Small Object Detection on High-Resolution Images This repository is the offical implementation of [**Efficient Small Object Detection on High-Resolution Images**](https://arxiv.org/abs/2407.16424). ## Installation [**Python>=3.6.0**](https://www.python.org/) is required with all [requirements.txt](requirements.txt) installed including [**PyTorch>=1.7**](https://pytorch.org/get-started/locally/): ```bash # (optional) conda install cuda-toolkit=11.8 -c pytorch pip install -r requirements.txt pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html pip install setuptools==59.5.0 ``` ## Data Preparation We currently support [**VisDrone**](https://github.com/VisDrone/VisDrone-Dataset), [**UAVDT**](https://sites.google.com/view/grli-uavdt/%E9%A6%96%E9%A1%B5), and [**TinyPerson**](https://github.com/ucas-vg/TinyBenchmark) datasets. Follow the instructions below to prepare datasets.
Data Prepare (click to expand) * **Darknet Format**: The Darknet framework **locates labels automatically for each image** by replacing the last instance of `/images/` in each image path with `/labels/`. For example: ``` dataset/images/im0.jpg dataset/labels/im0.txt ``` The images and labels from VisDrone, UAVDT, TinyPerson are all organized in this format. * **Ground-Truth Heatmap**: We recommend to leverage the [segment-anything](https://github.com/facebookresearch/segment-anything) model (SAM) to introduce precise shape prior to the GT heatmaps for training. You need to install SAM first: ```bash cd third_party/segment-anything pip install -e . ``` * **Dataset - VisDrone**: Download the [data](https://github.com/VisDrone/VisDrone-Dataset), and ensure the subsets under the `/path/to/visdrone` directory are as follows: ``` VisDrone2019-DET-train VisDrone2019-DET-val VisDrone2019-DET-test-dev VisDrone2019-DET-test-challenge ``` Then make a soft-link to your directory, and run the `scripts/data_prepare.py` script to reorganize the images and labels: ```bash ln -sf /path/to/visdrone VisDrone python scripts/data_prepare.py --dataset VisDrone ``` * **Dataset - UAVDT**: Download the [data](https://drive.google.com/file/d/1m8KA6oPIRK_Iwt9TYFquC87vBc_8wRVc/view?usp=sharing) and perform similar preprocessing: ```bash ln -sf /path/to/uavdt UAVDT python scripts/data_prepare.py --dataset UAVDT ``` * **Dataset - TinyPerson**: Download the [data](https://drive.google.com/open?id=1KrH9uEC9q4RdKJz-k34Q6v5hRewU5HOw) and perform similar preprocessing: ```bash ln -sf /path/to/tinyperson TinyPerson python scripts/data_prepare.py --dataset TinyPerson ```
## Training Run commands below to reproduce results on the datasets, *e.g.*, [VisDrone](https://github.com/VisDrone/VisDrone-Dataset). Download the pretrained weights (e.g., [YOLOv5m](https://github.com/ultralytics/yolov5/releases/download/v5.0/yolov5m.pt)) and put them to the `weights/pretrained/` directory first. * **Training on Single GPU** Here are the default setting to adapt **YOLOv5m** to **VisDrone** using our ESOD framework: ```bash DATASET=visdrone MODEL=yolov5m GPUS=0 BATCH_SIZE=8 IMAGE_SIZE=1536 EPOCHS=50 bash ./scripts/train.sh ``` * **DDP Training** When multiple GPUs are available, the [DistributedDataParallel](https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel) mode can be applied to speed up the training. Simply set `GPUS` according to your devices, e.g. `GPUS=0,1,2,3` ```bash DATASET=visdrone MODEL=yolov5m GPUS=0,1,2,3 BATCH_SIZE=32 IMAGE_SIZE=1536 EPOCHS=50 bash ./scripts/train.sh ``` * **Model Card** We support the YOLOv5 series ([YOLOv5s](https://github.com/ultralytics/yolov5/releases/download/v5.0/yolov5s.pt), [YOLOv5m](https://github.com/ultralytics/yolov5/releases/download/v5.0/yolov5m.pt), [YOLOv5l](https://github.com/ultralytics/yolov5/releases/download/v5.0/yolov5l.pt), [YOLOv5x](https://github.com/ultralytics/yolov5/releases/download/v5.0/yolov5s.pt)). After downloading them to `weights/`, simply change the `MODEL` (as well as `IMAGE_SIZE`) to apply different model: ```bash DATASET=visdrone MODEL=yolov5s GPUS=0 BATCH_SIZE=8 IMAGE_SIZE=1536 EPOCHS=50 bash ./scripts/train.sh yolov5m 8 1536 yolov5l 4 1920 yolov5x 2 1920 ``` Besides, we also support the [RetinaNet](https://pytorch.org/vision/main/models/retinanet.html), [RTMDet](https://github.com/open-mmlab/mmyolo/tree/main/configs/rtmdet), [YOLOv8](https://github.com/ultralytics/ultralytics), and [GPViT](https://github.com/ChenhongyiYang/GPViT) models. You can download the pre-trained weights, convert to ESOD initialization, and train the models to specific datasets: ```bash python scripts/model_convert.py --model retinanet DATASET=visdrone MODEL=retinanet GPUS=0 BATCH_SIZE=8 IMAGE_SIZE=1536 EPOCHS=50 bash ./scripts/train.sh ``` Feel free to set `MODEL` as `retinanet`, `rtmdet`, or `gpvit` (`yolov8m` does not require model convert). The detailed instructions will come soon. * **Dataset Supported** Beside **VisDrone**, we also support model training on **UAVDT** and **TinyPerson** datasets: ```bash DATASET=uavdt MODEL=yolov5m GPUS=0 BATCH_SIZE=8 IMAGE_SIZE=1280 EPOCHS=50 bash ./scripts/train.sh tinyperson 2048 ``` ## Testing ### Vanilla Evaluation Run commands below to compute evaluation results (AP, AP50, empty rate, missing rate) with intergrated `utils/metrics.py`. ```bash python test.py --data data/visdrone.yaml --weights weights/yolov5m.pt --batch-size 8 --img-size 1536 --device 0 ``` For computational analysis (including GFLOPs and FPS), use the following command: ```bash python test.py --data data/visdrone.yaml --weights weights/yolov5m.pt --batch-size 1 --img-size 1536 --device 0 --task measure ``` ### Official Evaluation The organizations of data for official evaluation tools are different from Darknet. So an intermediate data conversion is required. Run command below to get the results in Darknet format. ```bash python test.py --data data/visdrone.yaml --weights weights/yolov5m.pt --batch-size 8 --img-size 1536 --device 0 --task test --save-txt --save-conf ``` Then run the specified script `data_convert.py` for corresponding data formats and perform official evaluations. * **VisDrone** test-dev set: ```bash cp -r ./evaluate/VisDrone2018-DET-toolkit ./VisDrone/ python data_convert.py --dataset VisDrone --pred runs/test/exp/labels cd ./VisDrone/VisDrone2018-DET-toolkit matlab -nodesktop -nosplash -r evalDET ``` * **UAVDT** : coming soon. * **TinyPerson**: coming soon. ## Inference The script `detect.py` runs inference and saves the results to `runs/detect`. ```bash python detect.py --weights weights/yolov5m.pt --source data/images/visdrone.txt --img-size 1536 --device 0 --view-cluster --line-thickness 1 ``` `--view-cluster` will draw the generated patches in green boxes and save the heat maps from both prediction and ground truth. ## Pretrained Weights - [x] Please find the pre-trained weights from [Google Drive](https://drive.google.com/drive/folders/1jrrtG34q6gqdGMx7SflNK1uM6iiVKpin?usp=drive_link) ## Acknowledgment A large part of the code is borrowed from [YOLO](https://github.com/ultralytics/yolov5). Many thanks for this wonderful work. ## Citation If you find this work useful in your research, please kindly cite the paper: ``` @article{liu2025esod, title={ESOD: Efficient Small Object Detection on High-Resolution Images}, author={Liu, Kai and Fu, Zhihang and Jin, Sheng and Chen, Ze and Zhou, Fan and Jiang, Rongxin and Chen, Yaowu and Ye, Jieping}, journal={IEEE Transactions on Image Processing}, volume={34}, pages={183--195}, year={2025} } ```