facebook
/

sapiens

sapiens

English

Model card Files Files and versions Community

rawalkhirodkar commited on Sep 11

Commit

117918a

•

1 Parent(s): b93299a

Update README.md

Browse files

Files changed (1) hide show

README.md +24 -17

README.md CHANGED Viewed

@@ -4,18 +4,10 @@ language:
 - en
 ---
-# Model Card for Sapiens
 <!-- Provide a quick summary of what the model is/does. -->
-Sapiens is a family of vision transformers pretrained on 300 million human images at 1024 x 1024 image resolution.\
-The pretrained models when finetuned for human-centric vision tasks generalize to in-the-wild conditions.
-## Model Details
-### Model Description
 Sapiens, a family of models for four fundamental human-centric vision tasks - 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction.
 Our models natively support 1K high-resolution inference and are extremely easy to adapt for individual tasks by simply fine-tuning models pretrained on over 300 million in-the-wild human images.
 The resulting models exhibit remarkable generalization to in-the-wild data, even when labeled data is scarce or entirely synthetic.
@@ -23,21 +15,36 @@ Our simple model design also brings scalability - model performance across tasks
 Sapiens consistently surpasses existing baselines across various human-centric benchmarks.
 - **Developed by:** Meta
 - **Model type:** Vision Transformers
 - **License:** Creative Commons Attribution-NonCommercial 4.0
-### Model Sources
-<!-- Provide the basic links for the model. -->
-- **Repository:** https://github.com/facebookresearch/sapiens
-- **Paper:** https://arxiv.org/abs/2408.12569
-<!-- - **Demo [optional]:** [More Information Needed] -->
 ## Uses
 - pose estimation (keypoints 17, keypoints 133, keypoints 308)
 - body-part segmentation (28 classes)
 - depth estimation
-- surface normal estimation

 - en
 ---
+# Model Details
 <!-- Provide a quick summary of what the model is/does. -->
 Sapiens, a family of models for four fundamental human-centric vision tasks - 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction.
 Our models natively support 1K high-resolution inference and are extremely easy to adapt for individual tasks by simply fine-tuning models pretrained on over 300 million in-the-wild human images.
 The resulting models exhibit remarkable generalization to in-the-wild data, even when labeled data is scarce or entirely synthetic.
 Sapiens consistently surpasses existing baselines across various human-centric benchmarks.
+### Model Description
 - **Developed by:** Meta
 - **Model type:** Vision Transformers
 - **License:** Creative Commons Attribution-NonCommercial 4.0
+### More Resources
+- **Repository:** [https://github.com/facebookresearch/sapiens](https://github.com/facebookresearch/sapiens)
+- **Paper:** [https://arxiv.org/abs/2408.12569](https://arxiv.org/abs/2408.12569)
+- **Demos:** [Sapiens Gradio Spaces](https://huggingface.co/collections/facebook/sapiens-66d22047daa6402d565cb2fc)
+- **Project Page:** [https://about.meta.com/realitylabs/codecavatars/sapiens](https://about.meta.com/realitylabs/codecavatars/sapiens/)
+- **Additional Results:** [https://rawalkhirodkar.github.io/sapiens](https://rawalkhirodkar.github.io/sapiens/)
+- **HuggingFace Collection:** [https://huggingface.co/collections/facebook/sapiens-66d22047daa6402d565cb2fc](https://huggingface.co/collections/facebook/sapiens-66d22047daa6402d565cb2fc)
 ## Uses
 - pose estimation (keypoints 17, keypoints 133, keypoints 308)
 - body-part segmentation (28 classes)
 - depth estimation
+- surface normal estimation
+## Model Zoo
+This repository does not host any checkpoint but contains pointers to all the model repositories.
+## Model Zoo
+| Model Name | Original | TorchScript | BFloat16 |
+|:-----------|:--------:|:-----------:|:--------:|
+| sapiens-pretrain-0.3b | [link](https://huggingface.co/facebook/sapiens-pretrain-0.3b) | [link](https://huggingface.co/facebook/sapiens-pretrain-0.3b-torchscript) | [link](https://huggingface.co/facebook/sapiens-pretrain-0.3b-bfloat16) |
+| sapiens-pretrain-0.6b | [link](https://huggingface.co/facebook/sapiens-pretrain-0.6b) | [link](https://huggingface.co/facebook/sapiens-pretrain-0.6b-torchscript) | [link](https://huggingface.co/facebook/sapiens-pretrain-0.6b-bfloat16) |
+| sapiens-pretrain-1b | [link](https://huggingface.co/facebook/sapiens-pretrain-1b) | [link](https://huggingface.co/facebook/sapiens-pretrain-1b-torchscript) | [link](https://huggingface.co/facebook/sapiens-pretrain-1b-bfloat16) |
+| sapiens-pretrain-2b | [link](https://huggingface.co/facebook/sapiens-pretrain-2b) | [link](https://huggingface.co/facebook/sapiens-pretrain-2b-torchscript) | [link](https://huggingface.co/facebook/sapiens-pretrain-2b-bfloat16) |