RockeyCoss commited on
Commit
c7e1959
โ€ข
1 Parent(s): 083fa07
Files changed (1) hide show
  1. README.md +13 -237
README.md CHANGED
@@ -1,237 +1,13 @@
1
- # Prompt-Segment-Anything
2
- This is an implementation of zero-shot instance segmentation using [Segment Anything](https://github.com/facebookresearch/segment-anything). Thanks to the authors of Segment Anything for their wonderful work!
3
-
4
- This repository is based on [MMDetection](https://github.com/open-mmlab/mmdetection) and includes some code from [H-Deformable-DETR](https://github.com/HDETR/H-Deformable-DETR) and [FocalNet-DINO](https://github.com/FocalNet/FocalNet-DINO).
5
-
6
- ![example1](assets/example1.jpg)
7
-
8
- ## News
9
-
10
- **2023.04.12** Multimask output mode and cascade prompt mode is available now.
11
-
12
- **2023.04.11** Our [demo](https://huggingface.co/spaces/rockeycoss/Prompt-Segment-Anything-Demo) is available now. Please feel free to check it out.
13
-
14
- **2023.04.11** [Swin-L+H-Deformable-DETR + SAM](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-h.py)/[FocalNet-L+DINO + SAM](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-h.py) achieves strong COCO instance segmentation results: mask AP=46.8/49.1 by simply prompting SAM with boxes predicted by Swin-L+H-Deformable-DETR/FocalNet-L+DINO. (mask AP=46.5 based on ViTDet)๐Ÿบ
15
-
16
- ## Catalog
17
-
18
- - [x] Support Swin-L+H-Deformable-DETR+SAM
19
- - [x] Support FocalNet-L+DINO+SAM
20
- - [x] Support R50+H-Deformable-DETR+SAM/Swin-T+H-Deformable-DETR
21
- - [x] Support HuggingFace gradio demo
22
- - [x] Support cascade prompts (box prompt + mask prompt)
23
-
24
- ## Box-as-Prompt Results
25
-
26
- | Detector | SAM | multimask ouput | Detector's Box AP | Mask AP | Config |
27
- | :---------------------: | :-------: | :---------------: | :-----: | :----------------------------------------------------------: | ----------------------- |
28
- | R50+H-Deformable-DETR | sam-vit-b | :x: | 50.0 | 38.2 | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/r50-hdetr_sam-vit-b.py) |
29
- | R50+H-Deformable-DETR | sam-vit-b | :heavy_check_mark: | 50.0 | 39.9 | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/r50-hdetr_sam-vit-b_best-in-multi.py) |
30
- | R50+H-Deformable-DETR | sam-vit-l | :x: | 50.0 | 41.5 | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/r50-hdetr_sam-vit-l.py) |
31
- | Swin-T+H-Deformable-DETR | sam-vit-b | :x: | 53.2 | 40.0 | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-t-hdetr_sam-vit-b.py) |
32
- | Swin-T+H-Deformable-DETR | sam-vit-l | :x: | 53.2 | 43.5 | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-t-hdetr_sam-vit-l.py) |
33
- | Swin-L+H-Deformable-DETR | sam-vit-b | :x: | 58.0 | 42.5 | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-b.py) |
34
- | Swin-L+H-Deformable-DETR | sam-vit-l | :x: | 58.0 | 46.3 | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-l.py) |
35
- | Swin-L+H-Deformable-DETR | sam-vit-h | :x: | 58.0 | 46.8 | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-h.py) |
36
- | FocalNet-L+DINO | sam-vit-b | :x: | 63.2 | 44.5 | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-b.py) |
37
- | FocalNet-L+DINO | sam-vit-l | :x: | 63.2 | 48.6 | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-l.py) |
38
- | FocalNet-L+DINO | sam-vit-h | :x: | 63.2 | 49.1 | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-h.py) |
39
-
40
- ## Cascade-Prompt Results
41
-
42
- | Detector | SAM | multimask ouput | Detector's Box AP | Mask AP | Config |
43
- | :-------------------: | :-------: | :----------------: | :---------------: | :-----: | ------------------------------------------------------------ |
44
- | R50+H-Deformable-DETR | sam-vit-b | :x: | 50.0 | 38.8 | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/r50-hdetr_sam-vit-b_cascade.py) |
45
- | R50+H-Deformable-DETR | sam-vit-b | :heavy_check_mark: | 50.0 | 40.5 | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/r50-hdetr_sam-vit-b_best-in-multi_cascade.py) |
46
-
47
- ***Note***
48
-
49
- **multimask ouput**: If multimask output is :heavy_check_mark:, SAM will predict three masks for each prompt, and the segmentation result will be the one with the highest predicted IoU. Otherwise, if multimask output is :x:, SAM will return only one mask for each prompt, which will be used as the segmentation result.
50
-
51
- **cascade-prompt**: In the cascade-prompt setting, the segmentation process involves two stages. In the first stage, a coarse mask is predicted with a bounding box prompt. The second stage then utilizes both the bounding box and the coarse mask as prompts to predict the final segmentation result. Note that if multimask output is :heavy_check_mark:, the first stage will predict three coarse masks, and the second stage will use the mask with the highest predicted IoU as the prompt.
52
-
53
- ## Installation
54
-
55
- ๐Ÿบ๐Ÿบ๐Ÿบ Add dockerhub enviroment
56
-
57
- ```
58
- docker pull kxqt/prompt-sam-torch1.12-cuda11.6:20230410
59
- nvidia-docker run -it --shm-size=4096m -v {your_path}:{path_in_docker} kxqt/prompt-sam-torch1.12-cuda11.6:20230410
60
- ```
61
-
62
- We test the models under `python=3.7.10,pytorch=1.10.2,cuda=10.2`. Other versions might be available as well.
63
-
64
- 1. Clone this repository
65
-
66
- ```
67
- git clone https://github.com/RockeyCoss/Instance-Segment-Anything
68
- cd Instance-Segment-Anything
69
- ```
70
-
71
- 2. Install PyTorch
72
-
73
- ```bash
74
- # an example
75
- pip install torch torchvision
76
- ```
77
-
78
- 3. Install MMCV
79
-
80
- ```
81
- pip install -U openmim
82
- mim install "mmcv>=2.0.0"
83
- ```
84
-
85
- 4. Install MMDetection's requirements
86
-
87
- ```
88
- pip install -r requirements.txt
89
- ```
90
-
91
- 5. Compile CUDA operators
92
-
93
- ```bash
94
- cd projects/instance_segment_anything/ops
95
- python setup.py build install
96
- cd ../../..
97
- ```
98
-
99
- ## Prepare COCO Dataset
100
-
101
- Please refer to [data preparation](https://mmdetection.readthedocs.io/en/latest/user_guides/dataset_prepare.html).
102
-
103
- ## Prepare Checkpoints
104
-
105
- 1. Install wget
106
-
107
- ```
108
- pip install wget
109
- ```
110
-
111
- 2. SAM checkpoints
112
-
113
- ```bash
114
- mkdir ckpt
115
- cd ckpt
116
- python -m wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
117
- python -m wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth
118
- python -m wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
119
- cd ..
120
- ```
121
-
122
- 3. Here are the checkpoints for the detection models. You can download only the checkpoints you need.
123
-
124
- ```bash
125
- # R50+H-Deformable-DETR
126
- cd ckpt
127
- python -m wget https://github.com/HDETR/H-Deformable-DETR/releases/download/v0.1/r50_hybrid_branch_lambda1_group6_t1500_dp0_mqs_lft_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage_36eps.pth -o r50_hdetr.pth
128
- cd ..
129
- python tools/convert_ckpt.py ckpt/r50_hdetr.pth ckpt/r50_hdetr.pth
130
-
131
- # Swin-T+H-Deformable-DETR
132
- cd ckpt
133
- python -m wget https://github.com/HDETR/H-Deformable-DETR/releases/download/v0.1/swin_tiny_hybrid_branch_lambda1_group6_t1500_dp0_mqs_lft_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage_36eps.pth -o swin_t_hdetr.pth
134
- cd ..
135
- python tools/convert_ckpt.py ckpt/swin_t_hdetr.pth ckpt/swin_t_hdetr.pth
136
-
137
- # Swin-L+H-Deformable-DETR
138
- cd ckpt
139
- python -m wget https://github.com/HDETR/H-Deformable-DETR/releases/download/v0.1/decay0.05_drop_path0.5_swin_large_hybrid_branch_lambda1_group6_t1500_n900_dp0_mqs_lft_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage_36eps.pth -o swin_l_hdetr.pth
140
- cd ..
141
- python tools/convert_ckpt.py ckpt/swin_l_hdetr.pth ckpt/swin_l_hdetr.pth
142
-
143
- # FocalNet-L+DINO
144
- cd ckpt
145
- python -m wget https://projects4jw.blob.core.windows.net/focalnet/release/detection/focalnet_large_fl4_o365_finetuned_on_coco.pth -o focalnet_l_dino.pth
146
- cd ..
147
- python tools/convert_ckpt.py ckpt/focalnet_l_dino.pth ckpt/focalnet_l_dino.pth
148
- ```
149
-
150
- ## Run Evaluation
151
-
152
- 1. Evaluate Metrics
153
-
154
- ```bash
155
- # single GPU
156
- python tools/test.py path/to/the/config/file --eval segm
157
- # multiple GPUs
158
- bash tools/dist_test.sh path/to/the/config/file num_gpus --eval segm
159
- ```
160
-
161
- 2. Visualize Segmentation Results
162
-
163
- ```bash
164
- python tools/test.py path/to/the/config/file --show-dir path/to/the/visualization/results
165
- ```
166
- ## Gradio Demo
167
-
168
- We also provide a UI for displaying the segmentation results that is built with gradio. To launch the demo, simply run the following command in a terminal:
169
-
170
- ```bash
171
- pip install gradio
172
- python app.py
173
- ```
174
-
175
- This demo is also hosted on HuggingFace [here](https://huggingface.co/spaces/rockeycoss/Prompt-Segment-Anything-Demo).
176
-
177
- ## More Segmentation Examples
178
-
179
- ![example2](assets/example2.jpg)
180
- ![example3](assets/example3.jpg)
181
- ![example4](assets/example4.jpg)
182
- ![example5](assets/example5.jpg)
183
-
184
- ## Citation
185
-
186
- **Segment Anything**
187
-
188
- ```latex
189
- @article{kirillov2023segany,
190
- title={Segment Anything},
191
- author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
192
- journal={arXiv:2304.02643},
193
- year={2023}
194
- }
195
- ```
196
- **H-Deformable-DETR**
197
-
198
- ```latex
199
- @article{jia2022detrs,
200
- title={DETRs with Hybrid Matching},
201
- author={Jia, Ding and Yuan, Yuhui and He, Haodi and Wu, Xiaopei and Yu, Haojun and Lin, Weihong and Sun, Lei and Zhang, Chao and Hu, Han},
202
- journal={arXiv preprint arXiv:2207.13080},
203
- year={2022}
204
- }
205
- ```
206
- **Swin Transformer**
207
-
208
- ```latex
209
- @inproceedings{liu2021Swin,
210
- title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
211
- author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
212
- booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
213
- year={2021}
214
- }
215
- ```
216
- **DINO**
217
-
218
- ```latex
219
- @misc{zhang2022dino,
220
- title={DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection},
221
- author={Hao Zhang and Feng Li and Shilong Liu and Lei Zhang and Hang Su and Jun Zhu and Lionel M. Ni and Heung-Yeung Shum},
222
- year={2022},
223
- eprint={2203.03605},
224
- archivePrefix={arXiv},
225
- primaryClass={cs.CV}
226
- }
227
- ```
228
- **FocalNet**
229
-
230
- ```latex
231
- @misc{yang2022focalnet,
232
- author = {Yang, Jianwei and Li, Chunyuan and Dai, Xiyang and Yuan, Lu and Gao, Jianfeng},
233
- title = {Focal Modulation Networks},
234
- publisher = {arXiv},
235
- year = {2022},
236
- }
237
- ```
 
1
+ ---
2
+ title: Prompt Segment Anything
3
+ emoji: ๐Ÿš€
4
+ colorFrom: pink
5
+ colorTo: yellow
6
+ sdk: gradio
7
+ sdk_version: 3.24.1
8
+ app_file: app.py
9
+ pinned: false
10
+ license: apache-2.0
11
+ ---
12
+
13
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference