|
--- |
|
language: |
|
- ru |
|
- en |
|
library_name: transformers |
|
pipeline_tag: feature-extraction |
|
--- |
|
|
|
# ruclip-vit-base-patch32-384 |
|
|
|
**RuCLIP** (**Ru**ssian **C**ontrastive **L**anguage–**I**mage **P**retraining) is a multimodal model |
|
for obtaining images and text similarities and rearranging captions and pictures. |
|
RuCLIP builds on a large body of work on zero-shot transfer, computer vision, natural language processing and |
|
multimodal learning. |
|
|
|
Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams. |
|
|
|
- Task: `text ranking`; `image ranking`; `zero-shot image classification`; |
|
- Type: `encoder` |
|
- Num Parameters: `150M` |
|
- Training Data Volume: `240 million text-image pairs` |
|
- Language: `Russian` |
|
- Context Length: `77` |
|
- Transformer Layers: `12` |
|
- Transformer Width: `512` |
|
- Transformer Heads: `8` |
|
- Image Size: `384` |
|
- Vision Layers: `12` |
|
- Vision Width: `768` |
|
- Vision Patch Size: `32` |
|
|
|
## Usage [Github](https://github.com/sberbank-ai/ru-clip) |
|
|
|
``` |
|
pip install ruclip |
|
``` |
|
|
|
```python |
|
clip, processor = ruclip.load("ruclip-vit-base-patch32-384", device="cuda") |
|
``` |
|
|
|
## Performance |
|
|
|
We have evaluated the performance on the following datasets: |
|
|
|
| Dataset | Metric Name | Metric Result | |
|
| :------------ | :------------- | :------------ | |
|
| Food101 | acc | 0.642 | |
|
| CIFAR10 | acc | 0.862 | |
|
| CIFAR100 | acc | 0.529 | |
|
| Birdsnap | acc | 0.161 | |
|
| SUN397 | acc | 0.510 | |
|
| Stanford Cars | acc | 0.572 | |
|
| DTD | acc | 0.390 | |
|
| MNIST | acc | 0.404 | |
|
| STL10 | acc | 0.946 | |
|
| PCam | acc | 0.506 | |
|
| CLEVR | acc | 0.188 | |
|
| Rendered SST2 | acc | 0.508 | |
|
| ImageNet | acc | 0.451 | |
|
| FGVC Aircraft | mean-per-class | 0.053 | |
|
| Oxford Pets | mean-per-class | 0.587 | |
|
| Caltech101 | mean-per-class | 0.834 | |
|
| Flowers102 | mean-per-class | 0.449 | |
|
| HatefulMemes | roc-auc | 0.537 | |
|
|
|
# Authors |
|
|
|
- Alex Shonenkov: [Github](https://github.com/shonenkov), [Kaggle GM](https://www.kaggle.com/shonenkov) |
|
- Daniil Chesakov: [Github](https://github.com/Danyache) |
|
- Denis Dimitrov: [Github](https://github.com/denndimitrov) |
|
- Igor Pavlov: [Github](https://github.com/boomb0om) |