Reward model returns 0 scores for all cases
Thanks for your wonderful model!
Could you please help see this issue? When running the Skywork reward model on multiple GPUs (4x A6000), all reward scores return 0, unlike the non-zero scores in the official single-GPU example.
transformers 4.44.2
Hi,
Are you using the model across multiple GPUs in a pipeline-parallel or data-parallel configuration? Can you share the code that reproduces the error?
Just based on the provided code example, and set device_map="auto"
(likely pipeline-parallel)
I ran the code on 2x, 4x, and 8x A800 but couldn't reproduce the problem.
We suggest installing transformers
from the source and upgrading flash-attention to the latest version. Additionally, you could try setting attn_implementation
to eager
to see if it resolves the issue.
I am running into the same issue, 8xH100s. Looks like same issue. I get all 0s. I add auto mapping and am unable to reproduce the 94+ numbers.
We used the following packages with the corresponding versions:
transformers==4.45.2
flash-attn==2.6.3
torch==2.5.0
Additionally, our CUDA and CUDA driver version were 12.3 and 535.54.03, respectively.
Please make sure to enable bfloat16 and flash_attention_2 (not the default sdpa or eager) when loading the model.