Molmo-7B-D-0924 OOM on A100 80GB using Quick Start code

by sasawq21 - opened 8 days ago

8 days ago

Using quick start code from https://huggingface.co/allenai/Molmo-7B-O-0924 with same input image, got OOM using an A100 80GB gpu. Can you provide a test code that can run with A100 80GB? Runnable on 40GB is better, thanks

SinanAkkoyun

6 days ago

Run with with torch.autocast("cuda", enabled=True, dtype=torch.bfloat16):

mw44

3 days ago

•

edited 3 days ago

Run with with torch.autocast("cuda", enabled=True, dtype=torch.bfloat16):

Thanks for the tip! This enabled me to get this running on a 4090 (24GB VRAM) on Windows. I wanted to share my solution for anyone else who might be running into this issue.

processor = AutoProcessor.from_pretrained(
    'allenai/Molmo-7B-D-0924',
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map='auto'
)

model = AutoModelForCausalLM.from_pretrained(
    'allenai/Molmo-7B-D-0924',
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map='auto'
)

this enables loading the full model in VRAM and still have plenty left for inference.

prior to calling processor.process I added:

with torch.no_grad():
        with torch.cuda.amp.autocast(dtype=torch.bfloat16):

(the no_grad was a suggestion from o1-preview for memory savings, I'm not sure if its needed but it seems to work!)

SinanAkkoyun

3 days ago

•

edited 3 days ago

@mw44 I forgot to mention the bfloat16 weight loading, thanks for your comment :) no_grad is always nice to have, saved me a ton of VRAM for other transformers
(in this case, the generate_from_batch already has no_grad implemented so you can leave it out, but it's good practice)

mw44

2 days ago

@mw44 I forgot to mention the bfloat16 weight loading, thanks for your comment :) no_grad is always nice to have, saved me a ton of VRAM for other transformers
(in this case, the generate_from_batch already has no_grad implemented so you can leave it out, but it's good practice)

Any idea what amount of VRAM would be required for the 72B model if using bf16?

SinanAkkoyun

2 days ago

@mw44 I don't know exact numbers but given that llama 70b takes 148GB you could test Molmo 72B on a 2x A100 cloud node

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment