What learning rate should I use at first to fine-tune bart-large?
A good start is 1e-5 or 2e-5, with lr warm-up and decay. In the paper, we grid search the lr in [5e-6, 1e-5, 2e-5, 5e-5].
· Sign up or log in to comment