facebook
/

sapiens-pretrain-2b-torchscript

Image Feature Extraction

sapiens

English

Model card Files Files and versions Community

rawalkhirodkar commited on Sep 9

Commit

1f64c65

•

1 Parent(s): 857e99f

Update model card for Sapiens with architecture details

Browse files

Files changed (1) hide show

README.md +12 -13

README.md CHANGED Viewed

@@ -3,32 +3,31 @@ language: en
 license: cc-by-nc-4.0
 ---
-# Sapiens-2b-torchscript
-## Model Card for Sapiens
-Sapiens is a family of vision transformers pretrained on 300 million human images at 1024 x 1024 image resolution. The pretrained models, when finetuned for human-centric vision tasks, generalize to in-the-wild conditions.
 ## Model Details
-### Model Description
-Sapiens-2b natively support 1K high-resolution inference and are extremely easy to adapt for individual tasks by simply fine-tuning models pretrained on over 300 million in-the-wild human images. The resulting models exhibit remarkable generalization to in-the-wild data, even when labeled data is scarce or entirely synthetic. Our simple model design also brings scalability - model performance across tasks improves as we scale the parameters from 0.3 to 2 billion. Sapiens consistently surpasses existing baselines across various human-centric benchmarks.
 - **Developed by:** Meta
 - **Model type:** Vision Transformer
 - **License:** Creative Commons Attribution-NonCommercial 4.0
-- **Model Size:** 2b
 - **Task:** pretrain
 - **Format:** torchscript
 - **File:** sapiens_2b_epoch_660_torchscript.pt2
 ### Model Sources
 - **Repository:** [https://github.com/facebookresearch/sapiens](https://github.com/facebookresearch/sapiens)
 - **Paper:** [https://arxiv.org/abs/2408.12569](https://arxiv.org/abs/2408.12569)
 ## Uses
-Pretrained 2b model can be used for feature extraction, fine-tuning, or as a starting point for training new models.

 license: cc-by-nc-4.0
 ---
+# Sapiens-2B-torchscript
+## Model Card
+- **Embedding Dimensions:** N/A
+- **Num Layers:** N/A
+- **Num Heads:** N/A
+- **Feedforward Channels:** N/A
+- **Num Parameters:** 2B
+- **Input Image Size:** 1024 x 1024
+- **Patch Size:** 16 x 16
 ## Model Details
+Sapiens is a family of vision transformers pretrained on 300 million human images at 1024 x 1024 image resolution. The pretrained models, when finetuned for human-centric vision tasks, generalize to in-the-wild conditions.
+Sapiens-2B natively support 1K high-resolution inference. The resulting models exhibit remarkable generalization to in-the-wild data, even when labeled data is scarce or entirely synthetic.
 - **Developed by:** Meta
 - **Model type:** Vision Transformer
 - **License:** Creative Commons Attribution-NonCommercial 4.0
 - **Task:** pretrain
 - **Format:** torchscript
 - **File:** sapiens_2b_epoch_660_torchscript.pt2
 ### Model Sources
 - **Repository:** [https://github.com/facebookresearch/sapiens](https://github.com/facebookresearch/sapiens)
 - **Paper:** [https://arxiv.org/abs/2408.12569](https://arxiv.org/abs/2408.12569)
 ## Uses
+Pretrained 2B model can be used for feature extraction, fine-tuning, or as a starting point for training new models.