trl-lib (TRL)

Collections 2

spaces 2

Sleeping

10

⚒️

TextEnvironments

Runtime error

213

🦙

StackLLaMa

models 80

datasets 16

trl-lib/hh-rlhf-helpful-base

Viewer • Updated 6 days ago • 46.2k • 11

trl-lib/prm800k

Viewer • Updated 7 days ago • 41.2k • 66

trl-lib/rlaif-v

Viewer • Updated Sep 27 • 83.1k • 91 • 1

trl-lib/Capybara-Preferences

Viewer • Updated Sep 19 • 15.4k • 51

trl-lib/Capybara

Viewer • Updated Sep 19 • 16k • 1.05k

trl-lib/ultrafeedback-prompt

Viewer • Updated Sep 16 • 39.8k • 1.11k • 2

trl-lib/tldr

Viewer • Updated Sep 12 • 130k • 1.42k

trl-lib/ultrafeedback_binarized

Viewer • Updated Sep 12 • 63.1k • 4.54k • 3

trl-lib/lm-human-preferences-sentiment

Viewer • Updated Sep 10 • 6.26k • 56

trl-lib/lm-human-preferences-descriptiveness

Viewer • Updated Sep 10 • 6.26k • 39

TRL

AI & ML interests

Collections 2

teknium/OpenHermes-2.5-Mistral-7B

Intel/orca_dpo_pairs

trl-lib/OpenHermes-2-Mistral-7B-ipo-beta-0.1-steps-200

trl-lib/OpenHermes-2-Mistral-7B-ipo-beta-0.2-steps-200

trl-lib/pythia-1b-deduped-tldr-online-dpo

trl-lib/pythia-1b-deduped-tldr-sft

trl-lib/pythia-6.9b-deduped-tldr-online-dpo

trl-lib/pythia-2.8b-deduped-tldr-sft

spaces 2

TextEnvironments

StackLLaMa

models 80

trl-lib/Qwen2-0.5B-XPO

trl-lib/Qwen2-0.5B-OnlineDPO

trl-lib/Qwen2-0.5B-KTO

trl-lib/Qwen2-0.5B-ORPO

trl-lib/Qwen2-0.5B-DPO

trl-lib/Qwen2-0.5B-Reward

trl-lib/pythia-1b-deduped-tldr-rm

trl-lib/pythia-2.8b-deduped-tldr-online-dpo

trl-lib/pythia-6.9b-deduped-tldr-offline-dpo

trl-lib/pythia-2.8b-deduped-tldr-offline-dpo

datasets 16

trl-lib/hh-rlhf-helpful-base

trl-lib/prm800k

trl-lib/rlaif-v

trl-lib/Capybara-Preferences

trl-lib/Capybara

trl-lib/ultrafeedback-prompt

trl-lib/tldr

trl-lib/ultrafeedback_binarized

trl-lib/lm-human-preferences-sentiment

trl-lib/lm-human-preferences-descriptiveness

AI & ML interests

Team members 8

Collections 2

spaces 2 Sort: Recently updated

TextEnvironments

StackLLaMa

models 80 Sort: Recently updated

datasets 16 Sort: Recently updated

spaces 2

models 80

datasets 16