model architecture is not aligned with "google/flan-t5-small"

#3
by rrison - opened

In original paper. "Small. We consider a smaller model, which scales the baseline down by using dmodel = 512, dff = 2,048, 8-headed attention, and only 6 layers each in the encoder and decoder"

But Google's actual implementation is
network.T5Config:
emb_dim = 512
num_heads = 6
num_encoder_layers = 8
num_decoder_layers = 8
head_dim = 64
mlp_dim = 1024

it is confusing.

reference:
https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/small.gin
https://huggingface.co/google/flan-t5-small

Sign up or log in to comment