How much GPU memory is required for 32k context embedding?

#13

by Labmem009 - opened Mar 4

Mar 4

I tried to use this model to get embedding of long text, but I failed many times with 6*A100 and DP for OOM. Is there any suggestion to allocate memory for long text?

bayley

Mar 13

•

edited Mar 13

Try:

with torch.no_grad():
    outputs = model(**tokens)

I can do 4K tokens with room to spare on 2x 16GB GPUs and fp16

muzzy

Apr 5

Is there any way to do this while using sentence-transformers? Every time I try to load it, it tries to allocate 96GB of VRAM.

embedding = HuggingFaceEmbeddings(model_name='Salesforce/SFR-Embedding-Mistral', model_kwargs={'device':f"cuda:{device_num}"})

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment