What's the context size ?

by AliceThirty - opened Jan 1

Jan 1

•

Thank you, this model is doing extremely good AND it's running on my laptop! I love it.
Sorry if I missed the information somewhere but what's the context size please?

AliceThirty changed discussion title from Context size? to What's the context size ? Jan 1

Firepin

Jan 1

should be 32000

Undi95

NeverSleep org Jan 2

Yes, same as Mixtral! 32k

AliceThirty

Jan 2

Thank you!

papercanteen111

Jan 2

@AliceThirty what kind of GPU in that laptop? I've been unable to get it working with 8gb VRAM and 64gb ram :thinkies:

sergkisel3v

Jan 2

@papercanteen111 According to Mistral AI, original Mixtral needs around 110 gb of VRAM (~2x V100 80gb in cloud), but you can run quantized gguf version on CPU&gpu (I'm running 4 but quantization on 64 gb ram + 8gb VRAM and it works pretty well).

hyeonse

Jan 3

@papercanteen111 Experimentally I've succeeded in making a q3 of this model "run" on my GPU with 12gb VRAM, but it would take actual 8 minutes to load 2k context and take 3 minutes to produce a response at 0.4 tokens per second. I'm toying with it on a rented 48gb VRAM GPU and it generates faster than my eye can read in that environment.

Additionally there's some researching regarding HQQ + MoE offloading that has succeeded in making Mixtral 8x7b run at usable speed in 12gb VRAM happening as of two days ago. Maybe there will be an easy way to generalize that effect for all MoEs soon?

AliceThirty

Jan 4

•

edited Jan 4

@AliceThirty what kind of GPU in that laptop? I've been unable to get it working with 8gb VRAM and 64gb ram :thinkies:

Sorry for the missunderstanding, I am not running this model but the GGUF version (Q5_K_M) on my laptop. I have 64GB of RAM and a rtx 3080 ti with 16GB of VRAM
(I opened this thread in the wrong page)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment