English
align
clip

Model Details

This is an unofficial implementation of ALIGN trained on COYO-700M. The official ALIGN is trained on its dataset of 1.8B samples. That dataset is not released to the public. Instead, we trained our implementation of ALIGN model on COYO-700M.

It's developed by Kakao Brain to validate the performance of COYO-700M dataset on a large-scale model.

The training took about 8 days on TPU V3-512.

Model Date

April 2022

Model Type

This is dual encoder model where

  • image encoder is using EfficientNet-B7 architecture
  • text encoder is using BERT-base architecture

Training data

This model is trained on COYO-700M dataset.

Evaluation results

Dataset ImageNet Flickr30k MsCOCO
KNN I2T R@1 T2I R@1 I2T R@1 T2I R@1
ALIGN-L2-Large(Google) ALIGN 1.8B 76.4 88.6 75.7 58.6 45.6
ALIGN-B7-Base(Google) ALIGN 1.8B 69.3 - - 55.4 41.7
COYO-ALIGN-B7-Base(Kakao Brain) COYO-700M 68.6 88.1 73.2 61.2 43.1
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Inference API (serverless) has been turned off for this model.

Dataset used to train kakaobrain/coyo-align-b7-base