About the model architecture
#203
by
allenxiao
- opened
"Depth was chosen on the basis of the maximum depth for which there were sufficient data to pretrain as it has been established that this approach yields the greatest predictive potential in other informational fields including natural language understanding, computer vision and mathematical problem-solving. "
The parameter depth means the number of layers, such as 6 layers or 12 layers?
Thank you for the great work.
Yes, in deep learning models, depth refers to the number of layers and width refers to the number of embedding dimensions. You may find this reference helpful with regard to training models with sufficient data to train their parameters:
https://arxiv.org/pdf/2010.14701.pdf
ctheodoris
changed discussion status to
closed