how to perform inference over multi-gpu setup
#2
by
fcakyon
- opened
as given in the readme of https://huggingface.co/THUDM/cogvlm-chat-hf
device_map = infer_auto_device_map(model, max_memory={0:'20GiB',1:'20GiB','cpu':'16GiB'}, no_split_module_classes=['CogVLMDecoderLayer', 'TransformerLayer'])
how to dispatch THUDM/cogagent-vqa-hf
model into multiple gpus?
cc: @qingsonglv @chenkq
I managed to perform inference on multiple gpus also by following example from https://huggingface.co/THUDM/cogvlm-chat-hf and replacing device_map with:
device_map = infer_auto_device_map(model, max_memory={0:'18GiB',1:'18GiB','cpu':'16GiB'},no_split_module_classes=['CogAgentDecoderLayer'])