How much GPU memory is required for 32k context embedding?

#13
by Labmem009 - opened

I tried to use this model to get embedding of long text, but I failed many times with 6*A100 and DP for OOM. Is there any suggestion to allocate memory for long text?

Try:

with torch.no_grad():
    outputs = model(**tokens)

I can do 4K tokens with room to spare on 2x 16GB GPUs and fp16

Is there any way to do this while using sentence-transformers? Every time I try to load it, it tries to allocate 96GB of VRAM.

embedding = HuggingFaceEmbeddings(model_name='Salesforce/SFR-Embedding-Mistral', model_kwargs={'device':f"cuda:{device_num}"})

Sign up or log in to comment