# ML_Decoder **Repository Path**: davidgao7/ML_Decoder ## Basic Information - **Project Name**: ML_Decoder - **Description**: Official PyTorch implementation of "ML-Decoder: Scalable and Versatile Classification Head" (2021) - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-02-11 - **Last Updated**: 2022-02-11 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ML-Decoder: Scalable and Versatile Classification Head [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ml-decoder-scalable-and-versatile/multi-label-classification-on-ms-coco)](https://paperswithcode.com/sota/multi-label-classification-on-ms-coco?p=ml-decoder-scalable-and-versatile)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ml-decoder-scalable-and-versatile/multi-label-zero-shot-learning-on-nus-wide)](https://paperswithcode.com/sota/multi-label-zero-shot-learning-on-nus-wide?p=ml-decoder-scalable-and-versatile)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ml-decoder-scalable-and-versatile/fine-grained-image-classification-on-stanford)](https://paperswithcode.com/sota/fine-grained-image-classification-on-stanford?p=ml-decoder-scalable-and-versatile)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ml-decoder-scalable-and-versatile/multi-label-classification-on-openimages-v6)](https://paperswithcode.com/sota/multi-label-classification-on-openimages-v6?p=ml-decoder-scalable-and-versatile)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ml-decoder-scalable-and-versatile/image-classification-on-cifar-100)](https://paperswithcode.com/sota/image-classification-on-cifar-100?p=ml-decoder-scalable-and-versatile)
[Paper](http://arxiv.org/abs/2111.12933) | [Pretrained Models](MODEL_ZOO.md) | [Datasets](Datasets.md) Official PyTorch Implementation > Tal Ridnik, Gilad Sharir, Avi Ben-Cohen, Emanuel Ben-Baruch, Asaf Noy >
DAMO Academy, Alibaba > Group **Abstract** In this paper, we introduce ML-Decoder, a new attention-based classification head. ML-Decoder predicts the existence of class labels via queries, and enables better utilization of spatial data compared to global average pooling. By redesigning the decoder architecture, and using a novel group-decoding scheme, ML-Decoder is highly efficient, and can scale well to thousands of classes. Compared to using a larger backbone, ML-Decoder consistently provides a better speed-accuracy trade-off. ML-Decoder is also versatile - it can be used as a drop-in replacement for various classification heads, and generalize to unseen classes when operated with word queries. Novel query augmentations further improve its generalization ability. Using ML-Decoder, we achieve state-of-the-art results on several classification tasks: on MS-COCO multi-label, we reach 91.4% mAP; on NUS-WIDE zero-shot, we reach 31.1% ZSL mAP; and on ImageNet single-label, we reach with vanilla ResNet50 backbone a new top score of 80.7%, without extra data or distillation.

## ML-Decoder Implementation ML-Decoder implementation is available [here](./src_files/ml_decoder/ml_decoder.py). It can be easily integrated into any backbone using this example code: ``` ml_decoder_head = MLDecoder(num_classes) # initilization spatial_embeddings = self.backbone(input_image) # backbone generates spatial embeddings logits = ml_decoder_head(spatial_embeddings) # transfrom spatial embeddings to logits ``` ## Inference Code and Pretrained Models See [Model Zoo](MODEL_ZOO.md)

## Training Code We share a full reproduction code for the article results. ### Multi-label Training Code
A reproduction code for MS-COCO multi-label: ``` python train.py \ --data=/home/datasets/coco2014/ \ --model_name=tresnet_l \ --image_size=448 ``` ### Single-label Training Code Our single-label training code uses the excellent [timm](https://github.com/rwightman/pytorch-image-models) repo. Reproduction code is currently from a fork, we will work toward a full merge to the main repo. ``` git clone https://github.com/mrT23/pytorch-image-models.git ``` This is the code for A2 configuration training, with ML-Decoder (--use-ml-decoder-head=1): ``` python -u -m torch.distributed.launch --nproc_per_node=8 \ --nnodes=1 \ --node_rank=0 \ ./train.py \ /data/imagenet/ \ --amp \ -b=256 \ --epochs=300 \ --drop-path=0.05 \ --opt=lamb \ --weight-decay=0.02 \ --sched='cosine' \ --lr=4e-3 \ --warmup-epochs=5 \ --model=resnet50 \ --aa=rand-m7-mstd0.5-inc1 \ --reprob=0.0 \ --remode='pixel' \ --mixup=0.1 \ --cutmix=1.0 \ --aug-repeats 3 \ --bce-target-thresh 0.2 \ --smoothing=0 \ --bce-loss \ --train-interpolation=bicubic \ --use-ml-decoder-head=1 ``` ### ZSL Training Code
First download the following files to the root path of the dataset: [benchmark_81_v0.json](https://miil-public-eu.oss-eu-central-1.aliyuncs.com/public/NUS_WIDE_ZSL/benchmark_81_v0.json)
[wordvec_array.pth](https://miil-public-eu.oss-eu-central-1.aliyuncs.com/public/NUS_WIDE_ZSL/data.csv)
[data.csv](https://miil-public-eu.oss-eu-central-1.aliyuncs.com/public/NUS_WIDE_ZSL/wordvec_array.pth)
Training code for NUS-WIDE ZSL: ``` python train_zsl_nus.py \ --data=/home/datasets/nus_wide/ \ --image_size=224 ``` ### New Top Results - Stanford-Cars and CIFAR 100 Using ML-Decoder classification head, we reached a top result of 96.41% on [Stanford-Cars dataset](https://paperswithcode.com/sota/fine-grained-image-classification-on-stanford), and 95.1% on [CIFAR-100 dataset](https://paperswithcode.com/sota/image-classification-on-cifar-100). We will add this result to a future version of the paper. ## Citation ``` @misc{ridnik2021mldecoder, title={ML-Decoder: Scalable and Versatile Classification Head}, author={Tal Ridnik and Gilad Sharir and Avi Ben-Cohen and Emanuel Ben-Baruch and Asaf Noy}, year={2021}, eprint={2111.12933}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```