Usable Models
Collection
5 items
•
Updated
•
2
Trained for one epoch on ultrafeedback_binarized using cDPO. Evaluation pending.
Some initial benchmark results:
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
hellaswag | 0 | acc | 0.6621 | ± | 0.0047 |
acc_norm | 0.8525 | ± | 0.0035 | ||
arc_challenge | 0 | acc | 0.6348 | ± | 0.0141 |
acc_norm | 0.6698 | ± | 0.0137 | ||
winogrande | 0 | acc | 0.7861 | ± | 0.0115 |
gsm8k | 0 | acc | 0.5694 | ± | 0.0136 |