# DeBiFormer
**Repository Path**: zxnvszyk/DeBiFormer
## Basic Information
- **Project Name**: DeBiFormer
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-02-23
- **Last Updated**: 2025-02-23
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# [DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention](https://arxiv.org/pdf/2410.08582)
Official PyTorch implementation of **DeBiFormer**, from the following paper:
[DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention](https://arxiv.org/pdf/2410.08582). ACCV 2024.\
[Nguyen Huu Bao Long](https://github.com/maclong01), [Chenyu Zhang](https://github.com/il1um), Yuzhi Shi, [Tsubasa Hirakawa](https://thirakawa.github.io/), [Takayoshi Yamashita](https://scholar.google.co.jp/citations?user=hkguTPgAAAAJ&hl=en), [Hironobu Fujiyoshi](https://scholar.google.com/citations?user=CIHKZpEAAAAJ&hl=en), and [Tohgoroh Matsui](https://xn--p8ja5bwe1i.jp/profile.html)
[](https://paperswithcode.com/sota/object-detection-on-coco-2017?p=debiformer-vision-transformer-with-deformable)
[](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=debiformer-vision-transformer-with-deformable)
[](https://paperswithcode.com/sota/image-classification-on-imagenet?p=debiformer-vision-transformer-with-deformable)
---
## News
* 2024-09-21: The paper has been accepted at ACCV 2024 !!!
## Results and Pre-trained Models
### ImageNet-1K trained models
| name | resolution |acc@1 | #params | FLOPs | model | log |
|:---:|:---:|:---:|:---:| :---:|:---:|:---:|
| DeBiFormer-T | 224x224 | 81.9 | 21.4 M | 2.6 G | [model](https://drive.google.com/drive/folders/1K_Zk5Etx2oh3yVccr71m1R3bTqWAI2bg) | [log](https://drive.google.com/drive/folders/1K_Zk5Etx2oh3yVccr71m1R3bTqWAI2bg) |
| DeBiFormer-S | 224x224 | 83.9 | 44 M | 5.4 G | [model](https://drive.google.com/drive/folders/1OmWKob1ECHgVMs5wSvZs3XF665zFJdHg) | [log](https://drive.google.com/drive/folders/1OmWKob1ECHgVMs5wSvZs3XF665zFJdHg) |
| DeBiFormer-B | 224x224 | 84.4 | 77 M | 11.8 G | [model](https://drive.google.com/drive/folders/1Ae3l2Q9nPbpOgSiTtX_HWyvSXIPQ9jce) | [log](https://drive.google.com/drive/folders/1Ae3l2Q9nPbpOgSiTtX_HWyvSXIPQ9jce) |
# Usage
First, clone the repository locally:
```
git clone https://github.com/maclong01/DeBiFormer.git
pip3 install -r requirements.txt
```
## Data preparation
Download and extract ImageNet train and val images from http://image-net.org/.
The directory structure is the standard layout for the torchvision [`datasets.ImageFolder`](https://pytorch.org/docs/stable/torchvision/datasets.html#imagefolder), and the training and validation data is expected to be in the `train/` folder and `val/` folder respectively:
```
/path/to/imagenet/
train/
class1/
img1.jpeg
class2/
img2.jpeg
val/
class1/
img3.jpeg
class/2
img4.jpeg
```
#### Training
To train DeBiFormer-S on ImageNet using 8 gpus for 300 epochs, run:
```shell
cd classification/
bash train.sh 8 --model debiformer_small --batch-size 256 --lr 5e-4 --warmup-epochs 20 --weight-decay 0.1 --data-path your_imagenet_path
```
#### Evaluation
To evaluate the performance of DeBiFormer-S on ImageNet using 8 gpus, run:
```shell
cd classification/
bash train.sh 8 --model debiformer_small --batch-size 256 --lr 5e-4 --warmup-epochs 20 --weight-decay 0.1 --data-path your_imagenet_path --resume ../checkpoints/debiformer_small_in1k_224.pth --eval
```
## Acknowledgement
This repository is built using the [timm](https://github.com/rwightman/pytorch-image-models) library, [DAT](https://github.com/LeapLabTHU/DAT), and [BiFormer](https://github.com/rayleizhu/BiFormer) repositories.
## License
This project is released under the MIT license. Please see the [LICENSE](LICENSE) file for more information.
## Citation
If you find this repository helpful, please consider citing:
```bibtex
@InProceedings{BaoLong_2024_ACCV,
author = {BaoLong, NguyenHuu and Zhang, Chenyu and Shi, Yuzhi and Hirakawa, Tsubasa and Yamashita, Takayoshi and Matsui, Tohgoroh and Fujiyoshi, Hironobu},
title = {DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention},
booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)},
month = {December},
year = {2024},
pages = {4455-4472}
}
```