Update config.json for flan-t5-small

#13

by petermca - opened Jun 30, 2023

base: refs/heads/main

←

from: refs/pr/13

Discussion Files changed

-2

petermca

Jun 30, 2023

I believe the num_heads and num_layers values are swapped for google/flan-t5-small. See the comparison for t5-small (link below) which flan-t5-small is based off. With the current values, the hidden size of the model isn't divisible by the number of attention heads (512 % 6 = 2).

https://huggingface.co/t5-small/blob/df1b051c49625cf57a3d0d8d3863ed4d13564fe4/config.json#L16

Update config.json for flan-t5-small0bde0759

rrison

Jul 31

•

edited Jul 31

t5-small implementation is not aligned with original paper, "Small. We consider a smaller model, which scales the baseline down by using dmodel = 512, dff = 2,048, 8-headed attention, and only 6 layers each in the encoder and decoder"

actual config is
network.T5Config:
emb_dim = 512
num_heads = 6
num_encoder_layers = 8
num_decoder_layers = 8
head_dim = 64
mlp_dim = 1024

https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/small.gin

Any clues on why the config is changed?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment