Recommanded hyperparameters?
#27
by
zhilinw6
- opened
Any recommendations or insights on effective SFT hyperparameter settings, like lr, batch size, epochs, weight decay ...
Any advices on processing training data?
You can refer to the training parameter settings introduced in the MGTE paper. The MGTE primarily focuses on encoder-only training, while the GTE-QWEN series models use LoRA for training. Apart from this factor, the other training hyperparameters and data strategies are similar.