Trelis/openchat_3.5-function-calling-v3 · Looking for the right graphics card for this model

Jan 4

I'm looking:
EVGA GeForce RTX 3090 FTW3 Ultra Gaming, 24GB GDDR6X, iCX3 Technology, ARGB LED, Metal Backplate, 24 - pretty expensive...

Or this one, quite a bit less expensive...

RTX 4060 16Gb:
Powered by NVIDIA DLSS 3, ultra-efficient Ada Lovelace architechture, and full ray tracing
4th Generation Tensor Cores: Up to 4x performance with DLSS 3
3rd Generation RT Cores: Up to 2x ray tracing performance
Powered by GeForce RTX 4060 Ti
Integrated with 16GB GDDR6 128-bit memory interface
WINDFORCE Cooling System, Protection metal back plate

Graphics cards are super confusing to me since the manufacturers use these confusing designations all with different GBs and Tensors.

But since I'm putting together a build for it, do you have a recommendation?

I want the best function calling response and it is important that when no function is required, then it just answers. I won't be serving more than 10 concurrent requests and even if so it may cheaper to build more servers with lesser GB cards?

Lastly does the rest of the computer really matter? I'll probably have PCIe 3, DDR4 (32Gig) and an intel i7 9000.

RonanMcGovern

Trelis org Jan 5

Howdy! Base computer shouldn't matter too much.

If you go with a 16 GB GPU, then it will just about run a 7B model in 16 bit precision. But if you run Text Generation Inference then you can run in 8bit with eetq - which is fast and will half your memory requirement.

I haven't dug too deep nor run my own gpu at home (other than mac) but your cheaper choice seems ok. Check out localllama on reddit for info on GPUS: The LLM GPU Buying Guide - August 2023 : r/LocalLLaMA

kjhamilton

Jan 8

This is very helpful. I've ordered a used 3090 from EBay.

kjhamilton changed discussion status to closed Jan 8