potential of 405b model

#27
by nskumar - opened

Does it really producing the 128k contexts length?
i tried to do it with full context length, but it does not taking the 128k, it supports only 10,500 tokens. am i missing anything

It takes a ton of memory to do 128k for a 405b model. It is possible, but it would require a lot of GPUs and it would be slow

Sign up or log in to comment