Could you let me know when the bfloat16 model will be uploaded? I can't run the float32 model!

by Cach - opened Sep 26

Discussion

Cach

Sep 26

Could you let me know when the bfloat16 model will be uploaded? I can't run the float32 model!

chrisc36

Ai2 org Sep 26

•

edited Sep 26

Yes, we would like to build a bfloat16 compatible version. In the meantime you can run this model with torch.autocast to save some memory:

with torch.autocast("cuda", enabled=True, dtype=autocast_precision):

We did our evaluations in that setting. (float32 weights with autocast enabled)

sanghol

Ai2 org Sep 26

The current code does not support bfloat16 inference directly, but you can try with torch.autocast.

import torch
with torch.autocast("cuda", enabled=True, dtype=torch.bfloat16):
    output = model.generate_from_batch(
        inputs,
        GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
        tokenizer=processor.tokenizer
    )

Note that the weights will still be in float32.

verityw

Sep 27

Will a bfloat16 version be released some point in the future though?

Cach

Sep 29

import torch
with torch.autocast("cuda", enabled=True, dtype=torch.bfloat16):
output = model.generate_from_batch(
inputs,
GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
tokenizer=processor.tokenizer
)

使用这个问题依旧还是存在，仍然报OOM!

chrisc36

Ai2 org Sep 30

•

edited Sep 30

You can now also convert the model to bfloat16 (see the updated README), although note I have seen the model produce slightly different outputs when the weights are bfloat16 instead of float32

caesium94

Oct 1

I've tried to do it the way it is written in README, but it didn't work for me (while using 24GB 4090). The float32 model didn't fit on my GPU, I got following warning:

[2024-10-01 07:39:57 +0000] [100] [WARNING] Some parameters are on the meta device because they were offloaded to the cpu.

And when I tried to convert the model to bfloat16, I got error:

RuntimeError: You can't move a model that has some modules offloaded to cpu or disk.

The way to fix it is to load the model in bfloat16 from the beggining:

model = AutoModelForCausalLM.from_pretrained(
    'allenai/Molmo-7B-D-0924',
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map='auto'
)

Just change 'auto' to torch.bfloat16and remove the line:

model.to(dtype=torch.bfloat16)

info-int

Oct 2

There is a 4-bit quantized version which seems to work well:
https://huggingface.co/cyan2k/molmo-7B-O-bnb-4bit

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment