|
--- |
|
license: cc-by-nc-4.0 |
|
--- |
|
# Hiera (hiera_base_224) |
|
|
|
Hiera is a hierarchical transformer that is a much more efficient alternative to previous series of hierarchical transformers (ConvNeXT and Swin). |
|
Vanilla transformer architectures (Dosovitskiy et al. 2020) are very popular yet simple and scalable architectures that enable pretraining strategies such as MAE (He et al., 2022). |
|
However, they use the same spatial resolution and number of channels throughout the network, ViTs make inefficient use of their parameters. This |
|
is in contrast to prior “hierarchical” or “multi-scale” models (e.g., Krizhevsky et al. (2012); He et al. (2016)), which use fewer channels but higher spatial resolution in early stages |
|
with simpler features, and more channels but lower spatial resolution later in the model with more complex features. |
|
These models are way too complex though which add overhead operations to achieve state-of-the-art accuracy in ImageNet-1k, making the model slower. |
|
Hiera attempts to address this issue by teaching the model spatial biases by training MAE. |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6141a88b3a0ec78603c9e784/ogkud4qc564bPX3f0bGXO.png) |
|
|