Dropout
#116
by
Muennighoff
- opened
Shouldn't the dropouts in the config be 0.1, as the model was pre-trained with dropout @TimeRobber @ybelkada ?
I don't know about this. I think this depends on what we want those configs to reflect:
- training procedure? In that sense yes we did use dropout 0.1 so we can update those
- best training procedure? My strong intuition is that we shouldn't have used dropout. Palm didn't set it for example.
- best config for finetuning? I think in this case we've seen that dropout has substantial impact on downstream tasks: https://arxiv.org/abs/2204.05832
I think either 1) or 3), so we should change the config, no?
2) could be the default parameters in transformers, but not for a model on the hub imo when it was trained differently
No strong opinion, but I feel this should already be answered somewhere. cc @patrickvonplaten
I second what @TimeRobber said, I don't have any strong opinion on that. But would be nice if we can update it with the parameter used for training, ie, 0.1 to make the config file reflect the parameters used for the training