SFT notebooks
#1
by
silvacarl
- opened
can we assume the parameters set in the SFT notebooks are set for optimal fine tuning?
The hyperparameters set for SFT are good. They should work well for most use cases. However, they are not optimal. You may improve performance if you tune them according to the specific Llama model that you want to fine-tune and the dataset used for fine-tuning.
This is especially the case for the learning rate. For the batch size, increasing it as much as you can (until you trigger out of memory errors) is a good rule of thumb. You may want also to increase the maximum sequence length if your training samples are longer than what I set.
excellent. we will try it out and give feedback.