# DAT-Detection **Repository Path**: zxnvszyk/DAT-Detection ## Basic Information - **Project Name**: DAT-Detection - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-02-20 - **Last Updated**: 2025-02-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Vision Transformer with Deformable Attention This repository contains the code of object detection and instance segmentation for the paper Vision Transformer with Deformable Attention \[[arXiv](https://arxiv.org/abs/2201.00520)\], and DAT++: Spatially Dynamic Vision Transformerwith Deformable Attention (extended version)\[[arXiv](https://arxiv.org/abs/2309.01430)]. This code is based on [mmdetection](https://github.com/open-mmlab/mmdetection) and [Swin Detection](https://github.com/SwinTransformer/Swin-Transformer-Object-Detection). To get started, you can follow the instructions in [Swin Transformer](https://github.com/SwinTransformer/Swin-Transformer-Object-Detection/blob/master/README.md). Other links: - [Classification](https://github.com/LeapLabTHU/DAT) - [Segmentation](https://github.com/LeapLabTHU/DAT-Segmentation) ## Dependencies In addition to the dependencies of the [classification](https://github.com/LeapLabTHU/DAT) codebase, the following packages are required: - mmcv-full == 1.4.0 - mmdetection == 2.26.0 ## Evaluating Pretrained Models ### RetinaNet | Backbone | Schedule | bbox mAP | mask mAP | config | pretrained weights | | :---: | :---: | :---: | :---: | :---: | :---: | | DAT-T++ | 1x | 46.8 | - | [config](configs/dat/rtn_tiny_1x_4n_dp00_lr2.py) | [OneDrive](https://1drv.ms/u/s!ApI0vb6wPqmtgroLKGEtPqZe5vVKgA?e=l61dNf) / [TsinghuaCloud](https://cloud.tsinghua.edu.cn/f/8e49b993adf242829ee1/) | | DAT-T++ | 3x | 49.2 | - | [config](configs/dat/rtn_tiny_3x_4n_dp02_lr2.py) | [OneDrive](https://1drv.ms/u/s!ApI0vb6wPqmtgroDwb1rL_Fb3ZFBWg?e=SB0b4R) / [TsinghuaCloud](https://cloud.tsinghua.edu.cn/f/ff4e48c036cb4d9badca/) | | DAT-S++ | 1x | 48.3 | - | [config](configs/dat/rtn_small_1x_4n_dp01_lr2.py) | [OneDrive](https://1drv.ms/u/s!ApI0vb6wPqmtgroKRvxYPXTxsaaLUg?e=9Ty1zb) / [TsinghuaCloud](https://cloud.tsinghua.edu.cn/f/c1585651b3ec41fa9eb5/) | | DAT-S++ | 3x | 50.2 | - | [config](configs/dat/rtn_small_3x_4n_dp05_lr2.py) | [OneDrive](https://1drv.ms/u/s!ApI0vb6wPqmtgroOnfDaAaodgfU9og?e=DuLtLB) / [TsinghuaCloud](https://cloud.tsinghua.edu.cn/f/3687b77098a647adb54f/) | ### Mask R-CNN | Backbone | Schedule | bbox mAP | mask mAP | config | pretrained weights | | :---: | :---: | :---: | :---: | :---: | :---: | | DAT-T++ | 1x | 48.7 | 43.7 | [config](configs/dat/mrcn_tiny_1x_4n_dp00_lr4.py) | [OneDrive](https://1drv.ms/u/s!ApI0vb6wPqmtgroIh3FpJJ1JezkAdw?e=ripLe0) / [TsinghuaCloud](https://cloud.tsinghua.edu.cn/f/2b61106279ae488bbb45/) | | DAT-T++ | 3x | 50.5 | 45.1 | [config](configs/dat/mrcn_tiny_3x_4n_dp03_lr4.py) | [OneDrive](https://1drv.ms/u/s!ApI0vb6wPqmtgroMWHVicveA4DjkFQ?e=b76PGk) / [TsinghuaCloud](https://cloud.tsinghua.edu.cn/f/a7db396a372449779347/) | | DAT-S++ | 1x | 49.8 | 44.5 | [config](configs/dat/mrcn_small_1x_4n_dp01_lr4.py) | [OneDrive](https://1drv.ms/u/s!ApI0vb6wPqmtgroFCpEZ1rCavI2xFg?e=O9w2ff) / [TsinghuaCloud](https://cloud.tsinghua.edu.cn/f/8f3653c027974136a74a/) | | DAT-S++ | 3x | 51.2 | 45.7 | [config](configs/dat/mrcn_small_3x_4n_dp05_lr4.py) | [OneDrive](https://1drv.ms/u/s!ApI0vb6wPqmtgroH1ZALwoxhcO37MQ?e=2aZeBj) / [TsinghuaCloud](https://cloud.tsinghua.edu.cn/f/3bde5a98a0ca46d8b2ee/) | ### Cascade Mask R-CNN | Backbone | Schedule | bbox mAP | mask mAP | config | pretrained weights | | :---: | :---: | :---: | :---: | :---: | :---: | | DAT-T++ | 1x | 52.2 | 45.0 | [config](configs/dat/cmrcn_tiny_1x_4n_dp00_lr4.py) | [OneDrive](https://1drv.ms/u/s!ApI0vb6wPqmtgroGo2oubInWIwrnWg?e=o9aJlK) / [TsinghuaCloud](https://cloud.tsinghua.edu.cn/f/35be35faa96044ee8561/) | | DAT-T++ | 3x | 53.0 | 46.0 | [config](configs/dat/cmrcn_tiny_3x_4n_dp01_lr4.py) | [OneDrive](https://1drv.ms/u/s!ApI0vb6wPqmtgroEOC8rgHFWAnO2yw?e=2BLK8g) / [TsinghuaCloud](https://cloud.tsinghua.edu.cn/f/1aa96891dda24451bbf8/) | | DAT-S++ | 3x | 54.2 | 46.9 | [config](configs/dat/cmrcn_small_3x_4n_dp05_lr4.py) | [OneDrive](https://1drv.ms/u/s!ApI0vb6wPqmtgroJ0--lDvJw90u2_g?e=293ING) / [TsinghuaCloud](https://cloud.tsinghua.edu.cn/f/17d84d4f5ae2457285af/) | | DAT-B++ | 3x | 54.5 | 47.0 | [config](configs/dat/cmrcn_base_3x_8n_dp08_lr4.py) | [OneDrive](https://1drv.ms/u/s!ApI0vb6wPqmtgroNdQXStfHGPbzkpw?e=fWJT5O) / [TsinghuaCloud](https://cloud.tsinghua.edu.cn/f/9ea06105dd394b8d97ac/) | To evaluate a pretrained checkpoint, please download the pretrain weights to your local machine and run the mmdetection test scripts as follows: ``` # single-gpu testing python tools/test.py --eval bbox segm # multi-gpu testing bash tools/dist_test.sh --eval bbox segm ``` **Please notice: Before training or evaluation, please set the `data_root` variable in `configs/_base_/datasets/coco_detection.py` (RetinaNet) and `configs/_base_/datasets/coco_instance.py` (Mask R-CNN & Cascade Mask R-CNN) to the path where MS-COCO data stores.** Since evaluating models needs no pretrain weights, you can set the `pretrained = None` in ``. ## Training To train a detector with pre-trained models, run: ``` # single-gpu training python tools/train.py # multi-gpu training bash tools/dist_train.sh ``` **Please notice: Make sure the `pretrained` variable in `` is correctly set to the path of pretrained DAT model.** In our experiments, we typically use 4 nodes of NVIDIA A100 GPU (40GB) to train the models, so the learning rates are scaled to 4 times of the default values for each detector. ## Acknowledgements This code is developed on the top of [Swin Transformer](https://github.com/microsoft/Swin-Transformer), we thank to their efficient and neat codebase. The computational resources supporting this work are provided by [Hangzhou High-Flyer AI Fundamental Research Co.,Ltd](https://www.high-flyer.cn/). ## Citation If you find our work is useful in your research, please consider citing: ``` @article{xia2023dat, title={DAT++: Spatially Dynamic Vision Transformer with Deformable Attention}, author={Zhuofan Xia and Xuran Pan and Shiji Song and Li Erran Li and Gao Huang}, year={2023}, journal={arXiv preprint arXiv:2309.01430}, } @InProceedings{Xia_2022_CVPR, author = {Xia, Zhuofan and Pan, Xuran and Song, Shiji and Li, Li Erran and Huang, Gao}, title = {Vision Transformer With Deformable Attention}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {4794-4803} } ``` ## Contact If you have any questions or concerns, please send email to [xzf23@mails.tsinghua.edu.cn](mailto:xzf23@mails.tsinghua.edu.cn).