# CutLER
**Repository Path**: mlyin/CutLER
## Basic Information
- **Project Name**: CutLER
- **Description**: mask cut复现
- **Primary Language**: Python
- **License**: Apache-2.0
- **Default Branch**: automated_fixup_code_of_conduct_file_exists
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 1
- **Created**: 2023-06-25
- **Last Updated**: 2023-11-30
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Cut and Learn for Unsupervised Object Detection and Instance Segmentation
**Cut**-and-**LE**a**R**n (**CutLER**) is a simple approach for training object detection and instance segmentation models without human annotations.
It outperforms previous SOTA by **2.7 times** for AP50 and **2.6 times** for AR on **11 benchmarks**.
> [**Cut and Learn for Unsupervised Object Detection and Instance Segmentation**](http://people.eecs.berkeley.edu/~xdwang/projects/CutLER/)
> Xudong Wang, Rohit Girdhar, Stella X. Yu, Ishan Misra
> Tech report
[project page](http://people.eecs.berkeley.edu/~xdwang/projects/CutLER/) | [arxiv](http://arxiv.org/abs/xxxx.yyyyy) | [colab](https://colab.research.google.com/drive/1NgEyFHvOfuA2MZZnfNPWg1w5gSr3HOBb?usp=sharing) | [bibtex](#citation)
## Features
- We propose MaskCut approach to generate pseudo-masks for multiple objects in an image.
- CutLER can learn unsupervised object detectors and instance segmentors solely on ImageNet-1K.
- CutLER exhibits strong robustness to domain shifts when evaluated on 11 different benchmarks across domains like natural images, video frames, paintings, sketches, etc.
- CutLER can serve as a pretrained model for fully/semi-supervised detection and segmentation tasks.
## Installation
See [installation instructions](INSTALL.md).
## Dataset Preparation
See [Preparing Datasets for CutLER](datasets/README.md).
## Method Overview
Cut-and-Learn has two stages: 1) generating pseudo-masks with MaskCut and 2) learning unsupervised detectors from pseudo-masks of unlabeled data.
### 1. MaskCut
MaskCut can be used to provide segmentation masks for multiple instances of each image.
### MaskCut Demo
Try out the MaskCut demo using Colab (no GPU needed): [](https://colab.research.google.com/drive/1X05lKL_IBRvZB7q6n6pb4w00_tIYjGlf?usp=sharing)
If you want to run MaskCut locally, we provide `demo.py` that is able to visualize the pseudo-masks produced by MaskCut.
Run it with:
```
cd maskcut
python demo.py --img-path imgs/demo2.jpg \
--N 3 --tau 0.15 --vit-arch base --patch-size 8 \
[--other-options]
```
We give a few demo images in maskcut/imgs/. If you want to run demo.py with cpu, simply add "--cpu" when running the demo script.
For imgs/demo4.jpg, you need to use "--N 6" to segment all six instances in the image.
Following, we give some visualizations of the pseudo-masks on the demo images.
### Generating Annotations for ImageNet-1K with MaskCut
To generate pseudo-masks for ImageNet-1K with MaskCut, first setup the ImageNet-1K dataset following [datasets/README.md](datasets/README.md), then run the following command:
```
cd maskcut
python maskcut.py \
--vit-arch base --patch-size 8 \
--tau 0.15 --fixed_size 480 --N 3 \
--num-folder 1000 --job-index 0 \
--dataset-path /path/to/dataset/traindir \
--out-dir /path/to/save/annotations \
```
Since it takes a long time to complete the pseudo-mask generation process for all 1.3M images stored in 1,000 folders, it is recommended to use multiple runs, each processing the pseudo-mask generation for fewer image folders at a time by setting "--num-folder" and "--job-index".
After that, you can merge all these json files using the following command:
```
python merge_jsons.py \
--base-dir /path/to/save/annotations \
--num-folder 2 --fixed-size 480 \
--tau 0.15 --N 3 \
--save-path imagenet_train_fixsize480_tau0.15_N3.json
```
The "--num-folder", "--fixed-size", "--tau" and "--N" of merge_jsons.py should match the ones used to run maskcut.py.
We also provide a submitit script to launch the pseudo-mask generation process with multiple nodes.
```
cd maskcut
bash run_maskcut_with_submitit.sh
```
After that, you can use "merge_jsons.py" to merge all these json files as described above.
### 2. CutLER
### Inference Demo for CutLER with Pre-trained Models
Try out the CutLER demo using Colab (no GPU needed): [](https://colab.research.google.com/drive/1NgEyFHvOfuA2MZZnfNPWg1w5gSr3HOBb?usp=sharing)
If you want to run CutLER demos locally,
1. Pick a model and its config file from [model zoo](#model-zoo),
for example, `model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml`.
2. We provide `demo.py` that is able to demo builtin configs. Run it with:
```
cd cutler
python demo/demo.py --config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml \
--input demo/imgs/*.jpg \
[--other-options]
--opts MODEL.WEIGHTS /path/to/cutler_w_cascade_checkpoint
```
The configs are made for training, therefore we need to specify `MODEL.WEIGHTS` to a model from model zoo for evaluation.
This command will run the inference and show visualizations in an OpenCV window.
* To run __on cpu__, add `MODEL.DEVICE cpu` after `--opts`.
* To save outputs to a directory (for images) or a file (for webcam or video), use `--output`.
Following, we give some visualizations of the model predictions on the demo images.
### Unsupervised Model Learning
Before training the detector, it is necessary to use MaskCut to generate pseudo-masks for all ImageNet data.
You can use the pre-generated json file directly, please download it from [here](http://dl.fbaipublicfiles.com/cutler/maskcut/imagenet_train_fixsize480_tau0.15_N3.json) and put it under "DETECTRON2_DATASETS/imagenet/annotations/".
Or if you want to generate your own pseudo-masks, you can follow the instructions on [MaskCut](#maskcut).
We provide a script `train_net.py`, that is made to train all the configs provided in CutLER.
To train a model with "train_net.py", first setup the ImageNet-1K dataset following [datasets/README.md](datasets/README.md), then run:
```
cd cutler
export DETECTRON2_DATASETS=/path/to/DETECTRON2_DATASETS/
python train_net.py --num-gpus 8 \
--config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml
```
If you want to train a model with multiple nodes, you may need to change [some model parameters](https://arxiv.org/abs/1706.02677) and some SBATCH command options in "tools/train-1node.sh" and "tools/single-node_run.sh", then run:
```
cd cutler
sbatch tools/train-1node.sh \
--config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml \
MODEL.WEIGHTS /path/to/dino/d2format/model
```
You can also convert a pre-trained DINO model to detectron2's format by yourself following [this link](https://github.com/facebookresearch/moco/tree/main/detection).
### Self-training
We further improve performance by self-training the model on its predictions.
Firstly, we can get model predictions on ImageNet via running:
```
python train_net.py --num-gpus 8 \
--config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml \
--test-dataset imagenet_train \
--eval-only TEST.DETECTIONS_PER_IMAGE 30 \
MODEL.WEIGHTS output/cutler_cascade_r1.pth \
OUTPUT_DIR output/
```
Secondly, we can run the following command to generate the json file for the first round of self-training:
```
python tools/get_self_training_ann.py \
--new-pred output/inference/coco_instances_results.json \
--prev-ann DETECTRON2_DATASETS/imagenet/annotations/imagenet_train_fixsize480_tau0.15_N3.json \
--save-path DETECTRON2_DATASETS/imagenet/annotations/cutler_imagenet1k_train_r1.json \
--threshold 0.7
```
Lastly, place "cutler_imagenet1k_train_r1.json" under "DETECTRON2_DATASETS/imagenet/annotations/", then launch the self-training process:
```
python train_net.py --num-gpus 8 \
--config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN_self_train.yaml \
--train-dataset imagenet_train_r1 \
MODEL.WEIGHTS output/cutler_cascade_r1.pth \
OUTPUT_DIR output/self-train-r1/
```
You can repeat above steps to complete multiple rounds of self-training and change some arguments (e.g., "--threshold" for round 1 and 2 are 0.7 and 0.65, respectively; "--train-dataset" for round 1 and 2 are "imagenet_train_r1" and "imagenet_train_r2", respectively; MODEL.WEIGHTS for round 1 and 2 are "output/cutler_cascade_r1.pth" and "output/cutler_cascade_r2.pth"). Please place all annotation files under DETECTRON2_DATASETS/imagenet/annotations.
Note: Please confirm that "--train-dataset", json file names and json file locations match the ones specified in "cutler/data/datasets/builtin.py".
You can also directly download the models and annotations used by each round of self-training:
### Unsupervised Zero-shot Evaluation
To evaluate a model's performance on 11 different datasets, please follow [datasets/README.md](datasets/README.md) to prepare datasets, update "model_weights" and "config_file" in `tools/eval.sh`, then run:
```
bash tools/eval.sh
```
### Model Zoo
We show zero-shot unsupervised object detection performance (AP50 | AR) on 11 different datasets spanning a variety of domains. ^: CutLER using Mask R-CNN as a detector; *: CutLER using Cascade Mask R-CNN as a detector.
| Methods |
Models |
COCO |
COCO20K |
VOC |
LVIS |
UVO |
Clipart |
Comic |
Watercolor |
KITTI |
Objects365 |
OpenImages |
| Prev. SOTA |
- |
9.6 | 12.6 |
9.7 | 12.6 |
15.9 | 21.3 |
3.8 | 6.4 |
10.0 | 14.2 |
7.9 | 15.1 |
9.9 | 16.3 |
6.7 | 16.2 |
7.7 | 7.1 |
8.1 | 10.2 |
9.9 | 14.9 |
| CutLER^ |
download |
21.1 | 29.6 |
21.6 | 30.0 |
36.6 | 41.0 |
7.7 | 18.7 |
29.8 | 38.4 |
20.9 | 38.5 |
31.2 | 37.1 |
37.3 | 39.9 |
15.3 | 25.4 |
19.5 | 30.0 |
17.1 | 26.4 |
| CutLER* |
download |
21.9 | 32.7 |
22.4 | 33.1 |
36.9 | 44.3 |
8.4 | 21.8 |
31.7 | 42.8 |
21.1 | 41.3 |
30.4 | 38.6 |
37.5 | 44.6 |
18.4 | 27.5 |
21.6 | 34.2 |
17.3 | 29.6 |
## Semi-supervised and Fully-supervised Learning
CutLER can also serve as a pretrained model for training fully supervised object detection and instance segmentation models and improves performance on COCO, including on few-shot benchmarks.
### Training & Evaluation in Command Line
You can find all the semi-supervised and fully-supervised learning configs provided in CutLER under `model_zoo/configs/COCO-Semisupervised`.
To train a model using K% labels with `train_net.py`, first set up the COCO dataset according to [datasets/README.md](datasets/README.md) and specify K value in the config file, then run:
```
python train_net.py --num-gpus 8 \
--config-file model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_{K}perc.yaml \
MODEL.WEIGHTS /path/to/cutler_pretrained_model
```
You can find all config files used to train supervised models under `model_zoo/configs/COCO-Semisupervised`.
The configs are made for 8-GPU training. To train on 1 GPU, you may need to [change some parameters](https://arxiv.org/abs/1706.02677), e.g. number of GPUs (num-gpus your_num_gpus), learning rates (SOLVER.BASE_LR your_base_lr) and batch size (SOLVER.IMS_PER_BATCH your_batch_size).
### Evaluation
To evaluate a model's performance, use
```
python train_net.py \
--config-file model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_{K}perc.yaml \
--eval-only MODEL.WEIGHTS /path/to/checkpoint_file
```
For more options, see `python train_net.py -h`.
### Model Zoo
We fine-tune a Cascade R-CNN model initialized with CutLER or MoCo-v2 on varying amounts of labeled COCO data, and show results (Box | Mask AP) on the val2017 split below:
| % of labels |
1% |
2% |
5% |
10% |
20% |
30% |
40% |
50% |
60% |
80% |
100% |
| MoCo-v2 |
11.8 | 10.0 |
16.2 | 13.8 |
20.5 | 17.8 |
26.5 | 23.0 |
32.5 | 28.2 |
35.5 | 30.8 |
37.3 | 32.3 |
38.7 | 33.6 |
39.9 | 34.6 |
41.6 | 36.0 |
42.8 | 37.0 |
| CutLER |
16.8 | 14.6 |
21.6 | 18.9 |
27.8 | 24.3 |
32.2 | 28.1 |
36.6 | 31.7 |
38.2 | 33.3 |
39.9 | 34.7 |
41.5 | 35.9 |
42.3 | 36.7 |
43.8 | 37.9 |
44.7 | 38.5 |
| Download |
model |
model |
model |
model |
model |
model |
model |
model |
model |
model |
model |
Both MoCo-v2 and our CutLER are trained for the 1x schedule using Detectron2, except for extremely low-shot settings with 1% or 2% labels. When training with 1% or 2% labels, we train both MoCo-v2 and our model for 3,600 iterations with a batch size of 16.
## License
The majority of CutLER, Detectron2 and DINO are licensed under the [Apache 2.0 license](LICENSE), however portions of the project are available under separate license terms: TokenCUT, Bilateral Solver and CRF are licensed under the MIT license; If you later add other third party code, please keep this license info updated, and please let us know if that component is licensed under something other than CC-BY-NC, MIT, or CC0.
## Ethical Considerations
CutLER's wide range of detection capabilities may introduce similar challenges to many other visual recognition recognition methods.
As the image can contain arbitrary instances, it may impact the model output.
## How to get support from us?
If you have any general questions, feel free to email us at [Xudong Wang](mailto:xdwang@eecs.berkeley.edu), [Ishan Misra](mailto:imisra@meta.com) and [Rohit Girdhar](mailto:rgirdhar@meta.com). If you have code or implementation-related questions, please feel free to send emails to us or open an issue in this codebase (We recommend that you open an issue in this codebase, because your questions may help others).
## Citation
If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation.
```
@article{wang2022cut,
title={Cut and Learn for Unsupervised Object Detection and Instance Segmentation},
author={Wang, Xudong and Girdhar, Rohit and Yu, Stella X and Misra, Ishan},
journal={arXiv preprint arXiv:xxxx.xxxxx},
year={2022}
}
```