About the model architecture

#203
by allenxiao - opened

"Depth was chosen on the basis of the maximum depth for which there were sufficient data to pretrain as it has been established that this approach yields the greatest predictive potential in other informational fields including natural language understanding, computer vision and mathematical problem-solving. "

The parameter depth means the number of layers, such as 6 layers or 12 layers?
Thank you for the great work.

Yes, in deep learning models, depth refers to the number of layers and width refers to the number of embedding dimensions. You may find this reference helpful with regard to training models with sufficient data to train their parameters:
https://arxiv.org/pdf/2010.14701.pdf

ctheodoris changed discussion status to closed

Sign up or log in to comment