Reward model returns 0 scores for all cases

by iseesaw - opened Sep 8

Sep 8

Thanks for your wonderful model!

Could you please help see this issue? When running the Skywork reward model on multiple GPUs (4x A6000), all reward scores return 0, unlike the non-zero scores in the official single-GPU example.

transformers                       4.44.2

chrisliu298

Skywork org Sep 8

Hi,

Are you using the model across multiple GPUs in a pipeline-parallel or data-parallel configuration? Can you share the code that reproduces the error?

iseesaw

Sep 8

Just based on the provided code example, and set device_map="auto" (likely pipeline-parallel)

chrisliu298

Skywork org Sep 8

I ran the code on 2x, 4x, and 8x A800 but couldn't reproduce the problem.

We suggest installing transformers from the source and upgrading flash-attention to the latest version. Additionally, you could try setting attn_implementation to eager to see if it resolves the issue.

chrisliu298 changed discussion status to closed Sep 28

lbathen

7 days ago

I am running into the same issue, 8xH100s. Looks like same issue. I get all 0s. I add auto mapping and am unable to reproduce the 94+ numbers.

chrisliu298

Skywork org 5 days ago

We used the following packages with the corresponding versions:

transformers==4.45.2
flash-attn==2.6.3
torch==2.5.0

Additionally, our CUDA and CUDA driver version were 12.3 and 535.54.03, respectively.

Please make sure to enable bfloat16 and flash_attention_2 (not the default sdpa or eager) when loading the model.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment