This is a reward model finetuned on Llemma-34b. To score the steps, pass encoded text = question + solution as input.

rewards = model(text).mean(dim=-1).sigmoid()[index]

Where index is the positions for special end tokens of each step.

Safetensors

Model size

33.7B params

Tensor type

BF16

Inference API

Unable to determine this model's library. Check the docs .

Collection including tkitsers/Llemma-reward-model