Zhaolin Gao
GitBag
AI & ML interests
Reinforcement Learning from Human Feedback
Organizations
Collections
1
models
226
GitBag/reasoning_rebel_eta_1e2_lr_3e-7_1731036923
Text Generation
•
Updated
•
5
GitBag/reasoning_rebel_eta_1e4_lr_3e-7_1731046941
Text Generation
•
Updated
•
3
GitBag/reasoning_rebel_eta_1e3_lr_3e-7_1731041913
Text Generation
•
Updated
•
4
GitBag/rloo_ultrainteract_pair_lr_3e-8_555134_1729995637
Text Generation
•
Updated
•
10
GitBag/rloo_ultrainteract_pair_lr_1e-8_555134_1729977727
Text Generation
•
Updated
•
12
GitBag/rloo_6_lr_2e-7_555134_1730042202
Text Generation
•
Updated
•
8
GitBag/rloo_5_lr_2e-7_555134_1730031306
Text Generation
•
Updated
•
10
GitBag/rloo_1_2_h_lr_2e-7_555134_1730036742
Text Generation
•
Updated
•
9
GitBag/rloo_ultrainteract_pair_lr_3e-7_555134_1729824395
Text Generation
•
Updated
•
6
GitBag/rloo_ultrainteract_pair_lr_3e-6_555134_1729859614
Text Generation
•
Updated
•
6
datasets
233
GitBag/llama3-ultrafeedback-reasoning-iter_2-1731046941-ckp_1
Updated
GitBag/llama3-ultrafeedback-reasoning-iter_2-1731046941-ckp_0
Updated
GitBag/llama3-ultrafeedback-reasoning-iter_2-1731041913
Updated
GitBag/llama3-ultrafeedback-reasoning-armo-tokenized_harvard
Viewer
•
Updated
•
53.9k
•
11
GitBag/llama3-ultrafeedback-reasoning-armo-tokenized
Viewer
•
Updated
•
53.9k
•
9
GitBag/llama-3_1-8b-it-gsm8k
Viewer
•
Updated
•
7.47k
•
1
GitBag/llama-3-70b-it-gsm8k
Viewer
•
Updated
•
7.47k
•
1
GitBag/gemma-2-27b-it-gsm8k
Viewer
•
Updated
•
7.47k
•
1
GitBag/llama-3_1-70b-it-gsm8k
Viewer
•
Updated
•
7.47k
•
2
GitBag/gemma-2-9b-it-gsm8k
Viewer
•
Updated
•
7.47k
•
1