sapiens
English
rawalkhirodkar commited on
Commit
117918a
1 Parent(s): b93299a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -17
README.md CHANGED
@@ -4,18 +4,10 @@ language:
4
  - en
5
  ---
6
 
7
- # Model Card for Sapiens
8
 
9
  <!-- Provide a quick summary of what the model is/does. -->
10
 
11
- Sapiens is a family of vision transformers pretrained on 300 million human images at 1024 x 1024 image resolution.\
12
- The pretrained models when finetuned for human-centric vision tasks generalize to in-the-wild conditions.
13
-
14
- ## Model Details
15
-
16
-
17
- ### Model Description
18
-
19
  Sapiens, a family of models for four fundamental human-centric vision tasks - 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction.
20
  Our models natively support 1K high-resolution inference and are extremely easy to adapt for individual tasks by simply fine-tuning models pretrained on over 300 million in-the-wild human images.
21
  The resulting models exhibit remarkable generalization to in-the-wild data, even when labeled data is scarce or entirely synthetic.
@@ -23,21 +15,36 @@ Our simple model design also brings scalability - model performance across tasks
23
  Sapiens consistently surpasses existing baselines across various human-centric benchmarks.
24
 
25
 
 
26
  - **Developed by:** Meta
27
  - **Model type:** Vision Transformers
28
  - **License:** Creative Commons Attribution-NonCommercial 4.0
29
 
30
 
31
- ### Model Sources
32
-
33
- <!-- Provide the basic links for the model. -->
34
-
35
- - **Repository:** https://github.com/facebookresearch/sapiens
36
- - **Paper:** https://arxiv.org/abs/2408.12569
37
- <!-- - **Demo [optional]:** [More Information Needed] -->
38
 
39
  ## Uses
40
  - pose estimation (keypoints 17, keypoints 133, keypoints 308)
41
  - body-part segmentation (28 classes)
42
  - depth estimation
43
- - surface normal estimation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - en
5
  ---
6
 
7
+ # Model Details
8
 
9
  <!-- Provide a quick summary of what the model is/does. -->
10
 
 
 
 
 
 
 
 
 
11
  Sapiens, a family of models for four fundamental human-centric vision tasks - 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction.
12
  Our models natively support 1K high-resolution inference and are extremely easy to adapt for individual tasks by simply fine-tuning models pretrained on over 300 million in-the-wild human images.
13
  The resulting models exhibit remarkable generalization to in-the-wild data, even when labeled data is scarce or entirely synthetic.
 
15
  Sapiens consistently surpasses existing baselines across various human-centric benchmarks.
16
 
17
 
18
+ ### Model Description
19
  - **Developed by:** Meta
20
  - **Model type:** Vision Transformers
21
  - **License:** Creative Commons Attribution-NonCommercial 4.0
22
 
23
 
24
+ ### More Resources
25
+ - **Repository:** [https://github.com/facebookresearch/sapiens](https://github.com/facebookresearch/sapiens)
26
+ - **Paper:** [https://arxiv.org/abs/2408.12569](https://arxiv.org/abs/2408.12569)
27
+ - **Demos:** [Sapiens Gradio Spaces](https://huggingface.co/collections/facebook/sapiens-66d22047daa6402d565cb2fc)
28
+ - **Project Page:** [https://about.meta.com/realitylabs/codecavatars/sapiens](https://about.meta.com/realitylabs/codecavatars/sapiens/)
29
+ - **Additional Results:** [https://rawalkhirodkar.github.io/sapiens](https://rawalkhirodkar.github.io/sapiens/)
30
+ - **HuggingFace Collection:** [https://huggingface.co/collections/facebook/sapiens-66d22047daa6402d565cb2fc](https://huggingface.co/collections/facebook/sapiens-66d22047daa6402d565cb2fc)
31
 
32
  ## Uses
33
  - pose estimation (keypoints 17, keypoints 133, keypoints 308)
34
  - body-part segmentation (28 classes)
35
  - depth estimation
36
+ - surface normal estimation
37
+
38
+ ## Model Zoo
39
+ This repository does not host any checkpoint but contains pointers to all the model repositories.
40
+
41
+
42
+ ## Model Zoo
43
+
44
+ | Model Name | Original | TorchScript | BFloat16 |
45
+ |:-----------|:--------:|:-----------:|:--------:|
46
+ | sapiens-pretrain-0.3b | [link](https://huggingface.co/facebook/sapiens-pretrain-0.3b) | [link](https://huggingface.co/facebook/sapiens-pretrain-0.3b-torchscript) | [link](https://huggingface.co/facebook/sapiens-pretrain-0.3b-bfloat16) |
47
+ | sapiens-pretrain-0.6b | [link](https://huggingface.co/facebook/sapiens-pretrain-0.6b) | [link](https://huggingface.co/facebook/sapiens-pretrain-0.6b-torchscript) | [link](https://huggingface.co/facebook/sapiens-pretrain-0.6b-bfloat16) |
48
+ | sapiens-pretrain-1b | [link](https://huggingface.co/facebook/sapiens-pretrain-1b) | [link](https://huggingface.co/facebook/sapiens-pretrain-1b-torchscript) | [link](https://huggingface.co/facebook/sapiens-pretrain-1b-bfloat16) |
49
+ | sapiens-pretrain-2b | [link](https://huggingface.co/facebook/sapiens-pretrain-2b) | [link](https://huggingface.co/facebook/sapiens-pretrain-2b-torchscript) | [link](https://huggingface.co/facebook/sapiens-pretrain-2b-bfloat16) |
50
+