Text Generation
Transformers
Safetensors
English
custom_code

how to perform inference over multi-gpu setup

#2
by fcakyon - opened

as given in the readme of https://huggingface.co/THUDM/cogvlm-chat-hf

device_map = infer_auto_device_map(model, max_memory={0:'20GiB',1:'20GiB','cpu':'16GiB'}, no_split_module_classes=['CogVLMDecoderLayer', 'TransformerLayer'])

how to dispatch THUDM/cogagent-vqa-hf model into multiple gpus?

cc: @qingsonglv @chenkq

I managed to perform inference on multiple gpus also by following example from https://huggingface.co/THUDM/cogvlm-chat-hf and replacing device_map with:

device_map = infer_auto_device_map(model, max_memory={0:'18GiB',1:'18GiB','cpu':'16GiB'},no_split_module_classes=['CogAgentDecoderLayer'])

@teo96 thanks a lot!

Sign up or log in to comment