Fine-tuning and DPO
#2
by
agershun
- opened
Could you share the thoughts about these questions:
- It is possible to do the minor fine-tuning and training with DPO this network?
- What packages is better to use?
- How much GPU memory do I need for LoRA for this network? Is A100/40 is enough?
Thank you!
yes it will work great to finetune, I recommend using huggingface trl.
It trains fine, thank you!
I adapted the code from this article and then modified the prompts to the Starling-LM format.
Nice work, I am glad you figured it out, let me know if you have any questions. Thanks for your support!
I finished wtih DPO. It also works fine with this model.