Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
6
Sahand Rezaei-Shoshtari
sahandrez
Follow
https://sahandrez.github.io/
sahandrez
AI & ML interests
Reinforcement Learning
Recent Activity
updated
a model
3 days ago
sahandrez/rloo-paired-Qwen2.5-1.5B-ultrafeedback-binarized-20241125-125438
updated
a model
3 days ago
sahandrez/rloo-paired-Qwen2.5-1.5B-ultrafeedback-binarized-20241125-125438
updated
a model
3 days ago
sahandrez/rloo-paired-Qwen2.5-1.5B-ultrafeedback-binarized-20241125-125438
View all activity
Organizations
None yet
models
6
Sort: Recently updated
sahandrez/rloo-paired-Qwen2.5-1.5B-ultrafeedback-binarized-20241125-125438
Updated
3 days ago
•
36
sahandrez/sft-Qwen2.5-1.5B-ultrafeedback
Text Generation
•
Updated
8 days ago
•
8
sahandrez/pairwise-reward-Qwen2.5-1.5B-ultrafeedback
Text Classification
•
Updated
10 days ago
•
9
sahandrez/pairwise-reward-sft-zephyr-7b-sft-qlora-ultrafeedback
Updated
Oct 14
sahandrez/pairwise-reward-zephyr-7b-sft-qlora-ultrafeedback
Updated
Oct 13
•
1
sahandrez/sft-zephyr-7b-sft-qlora-ultrafeedback
Updated
Oct 12
•
1
datasets
2
Sort: Recently updated
sahandrez/ultrafeedback_kto
Viewer
•
Updated
Sep 23
•
126k
•
33
sahandrez/ultrafeedback_unpaired
Viewer
•
Updated
Sep 20
•
126k
•
34