arxiv:2405.07863
Wei Xiong
weqweasdas
AI & ML interests
Machine learning, RLHF
Organizations
models
23
weqweasdas/zephyr-7b-dpo-full
Text Generation
•
Updated
•
4
weqweasdas/zephyr-7b-gemma-dpo
Updated
weqweasdas/zephyr-7b-sft-full
Updated
weqweasdas/zephyr-7b-dpo-qlora
Updated
weqweasdas/gpt2-cpt-dutch
Text Generation
•
Updated
•
11
weqweasdas/zephyr-7b-gemma-sft
Updated
weqweasdas/raft_baseline_zephyr_packing_model6_1_4_e6_weight085
Text Generation
•
Updated
•
6
weqweasdas/raft_baseline_zephyr_packing_model6_1_4_e6
Text Generation
•
Updated
•
6
weqweasdas/raft_baseline_zephyr_packing_model6
Text Generation
•
Updated
•
4
weqweasdas/raft_baseline_openchat_llama13b_model1
Text Generation
•
Updated
•
6
datasets
51
weqweasdas/gemma2_9b_math_iter2_prompt
Viewer
•
Updated
•
627k
weqweasdas/gemma2_9b_gsm8k_iter2_prompt
Viewer
•
Updated
•
427k
weqweasdas/gemma2b_sft_gen_intern20b_label_rm_0_25noise
Viewer
•
Updated
•
50.7k
•
2
weqweasdas/gemma2b_sft_gen_intern20b_label_rm
Viewer
•
Updated
•
50.7k
•
18
weqweasdas/llama3_8b_it_uf_internlm20blabel_without_lenpanalty
Viewer
•
Updated
•
111k
•
18
weqweasdas/llama3_8b_it_uf_internlm20blabel_with_000015_lenpanalty
Viewer
•
Updated
•
111k
•
10
weqweasdas/llama3_8b_sft_uf_internlm20blabel_0_00015_lenpanalty
Viewer
•
Updated
•
102k
•
10
weqweasdas/llama3_8b_sft_uf_internlm20blabel
Viewer
•
Updated
•
102k
•
27
weqweasdas/xxxx_test
Viewer
•
Updated
•
100
•
2
weqweasdas/xxxx.json
Viewer
•
Updated
•
100
•
2