shivi commited on
Commit
e78c866
1 Parent(s): f5fba60

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ tags:
4
+ - vision
5
+ - image-segmentation
6
+ datasets:
7
+ - coco
8
+ widget:
9
+ - src: http://images.cocodataset.org/val2017/000000039769.jpg
10
+ example_title: Cats
11
+ - src: http://images.cocodataset.org/val2017/000000039770.jpg
12
+ example_title: Castle
13
+ ---
14
+
15
+ # Mask2Former
16
+
17
+ Mask2Former model trained on Cityscapes semantic segmentation (large-sized version, Swin backbone). It was introduced in the paper [Masked-attention Mask Transformer for Universal Image Segmentation
18
+ ](https://arxiv.org/abs/2112.01527) and first released in [this repository](https://github.com/facebookresearch/Mask2Former/).
19
+
20
+ Disclaimer: The team releasing Mask2Former did not write a model card for this model so this model card has been written by the Hugging Face team.
21
+
22
+ ## Model description
23
+
24
+ Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA,
25
+ [MaskFormer](https://arxiv.org/abs/2107.06278) both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi-scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without
26
+ without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.
27
+
28
+ ![model image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/mask2former_architecture.png)
29
+
30
+ ## Intended uses & limitations
31
+
32
+ You can use this particular checkpoint for panoptic segmentation. See the [model hub](https://huggingface.co/models?search=mask2former) to look for other
33
+ fine-tuned versions on a task that interests you.
34
+
35
+ ### How to use
36
+
37
+ Here is how to use this model:
38
+
39
+ ```python
40
+ import requests
41
+ import torch
42
+ from PIL import Image
43
+ from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation
44
+
45
+
46
+ # load Mask2Former fine-tuned on Cityscapes semantic segmentation
47
+ processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-large-cityscapes-semantic")
48
+ model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-large-cityscapes-semantic")
49
+
50
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
51
+ image = Image.open(requests.get(url, stream=True).raw)
52
+ inputs = processor(images=image, return_tensors="pt")
53
+
54
+ with torch.no_grad():
55
+ outputs = model(**inputs)
56
+
57
+ # model predicts class_queries_logits of shape `(batch_size, num_queries)`
58
+ # and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
59
+ class_queries_logits = outputs.class_queries_logits
60
+ masks_queries_logits = outputs.masks_queries_logits
61
+
62
+ # you can pass them to processor for postprocessing
63
+ result = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
64
+ # we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)
65
+ predicted_semantic_map = result["segmentation"]
66
+ ```
67
+
68
+ For more code examples, we refer to the [documentation](https://huggingface.co/docs/transformers/master/en/model_doc/mask2former).