Recommanded hyperparameters?

#27

by zhilinw6 - opened Aug 7

Aug 7

Any recommendations or insights on effective SFT hyperparameter settings, like lr, batch size, epochs, weight decay ...
Any advices on processing training data?

thenlper

Alibaba-NLP org Aug 14

You can refer to the training parameter settings introduced in the MGTE paper. The MGTE primarily focuses on encoder-only training, while the GTE-QWEN series models use LoRA for training. Apart from this factor, the other training hyperparameters and data strategies are similar.

https://arxiv.org/abs/2407.19669

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment