Pythia models supervised finetuned and DPO finetuned with all of Anthropic-hh-rlhf dataset for 1 epoch.
Laura O'Mahony
lomahony
AI & ML interests
PhD student
Organizations
None yet
Collections
4
Papers
1
models
42
lomahony/pythia-1b-helpful-dpo
Text Generation
•
Updated
•
41
lomahony/pythia-70m-helpful-dpo
Text Generation
•
Updated
•
36
lomahony/pythia-160m-helpful-dpo
Text Generation
•
Updated
•
37
lomahony/pythia-1.4b-helpful-dpo
Text Generation
•
Updated
•
35
lomahony/pythia-2.8b-helpful-dpo
Text Generation
•
Updated
•
37
lomahony/pythia-410m-helpful-sft
Text Generation
•
Updated
•
45
lomahony/pythia-1b-helpful-sft
Text Generation
•
Updated
•
40
lomahony/pythia-1.4b-helpful-sft
Text Generation
•
Updated
•
37
lomahony/pythia-70m-helpful-sft
Text Generation
•
Updated
•
40
lomahony/pythia-160m-helpful-sft
Text Generation
•
Updated
•
84
datasets
None public yet