Image Segmentation
sapiens
English
Edit model card

Seg-Sapiens-0.3B-Bfloat16

Model Details

Sapiens is a family of vision transformers pretrained on 300 million human images at 1024 x 1024 image resolution. The pretrained models, when finetuned for human-centric vision tasks, generalize to in-the-wild conditions. Sapiens-0.3B natively support 1K high-resolution inference. The resulting models exhibit remarkable generalization to in-the-wild data, even when labeled data is scarce or entirely synthetic.

  • Developed by: Meta
  • Model type: Vision Transformer
  • License: Creative Commons Attribution-NonCommercial 4.0
  • Task: seg
  • Format: bfloat16
  • File: sapiens_0.3b_goliath_best_goliath_mIoU_7673_epoch_194_bfloat16.pt2

Model Card

  • Image Size: 1024 x 768 (H x W)
  • Num Parameters: 0.336 B
  • FLOPs: 1.242 TFLOPs
  • Patch Size: 16 x 16
  • Embedding Dimensions: 1024
  • Num Layers: 24
  • Num Heads: 16
  • Feedforward Channels: 4096

More Resources

Uses

Seg 0.3B model can be used to perform 28 class body part segmentation on human images.

Downloads last month
26
Inference Examples
Inference API (serverless) does not yet support sapiens models for this pipeline type.

Collection including facebook/sapiens-seg-0.3b-bfloat16