how to perform inference over multi-gpu setup

by fcakyon - opened Jan 7

Discussion

fcakyon

Jan 7

•

edited Jan 7

as given in the readme of https://huggingface.co/THUDM/cogvlm-chat-hf

device_map = infer_auto_device_map(model, max_memory={0:'20GiB',1:'20GiB','cpu':'16GiB'}, no_split_module_classes=['CogVLMDecoderLayer', 'TransformerLayer'])

how to dispatch THUDM/cogagent-vqa-hf model into multiple gpus?

cc: @qingsonglv @chenkq

teo96

Jan 25

•

edited Jan 25

I managed to perform inference on multiple gpus also by following example from https://huggingface.co/THUDM/cogvlm-chat-hf and replacing device_map with:

device_map = infer_auto_device_map(model, max_memory={0:'18GiB',1:'18GiB','cpu':'16GiB'},no_split_module_classes=['CogAgentDecoderLayer'])

fcakyon

Jan 25

@teo96 thanks a lot!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment