Inference Scaling Laws Llemma Models
Collection
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
•
3 items
•
Updated
This is a reward model finetuned on Llemma-34b. To score the steps, pass encoded text = question + solution as input.
rewards = model(text).mean(dim=-1).sigmoid()[index]
Where index is the positions for special end tokens of each step.