namangarg110
/

hiera_base_224

Image Feature Extraction

Inference Endpoints

Model card Files Files and versions Community

hiera_base_224 / README.md

namangarg110's picture

Update README.md

ff0af2d verified 9 months ago

|

1.2 kB

	---
	license: cc-by-nc-4.0
	---
	# Hiera (hiera_base_224)

	Hiera is a hierarchical transformer that is a much more efficient alternative to previous series of hierarchical transformers (ConvNeXT and Swin).
	Vanilla transformer architectures (Dosovitskiy et al. 2020) are very popular yet simple and scalable architectures that enable pretraining strategies such as MAE (He et al., 2022).
	However, they use the same spatial resolution and number of channels throughout the network, ViTs make inefficient use of their parameters. This
	is in contrast to prior “hierarchical” or “multi-scale” models (e.g., Krizhevsky et al. (2012); He et al. (2016)), which use fewer channels but higher spatial resolution in early stages
	with simpler features, and more channels but lower spatial resolution later in the model with more complex features.
	These models are way too complex though which add overhead operations to achieve state-of-the-art accuracy in ImageNet-1k, making the model slower.
	Hiera attempts to address this issue by teaching the model spatial biases by training MAE.
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/6141a88b3a0ec78603c9e784/ogkud4qc564bPX3f0bGXO.png)