TheBloke/Llama-2-70B-Chat-GPTQ

#41 opened about 1 year ago by

simonesartoni1

GCP system to host llama2 70B Chat model

#40 opened about 1 year ago by

Hammad-Ahmad

how can i use the model to perform multigpu inference?

#39 opened about 1 year ago by

weijie210

inference take more than 10 min

#38 opened about 1 year ago by

shravanveldurthi

Out of memory error, but both system and GPU have plenty of memory

5

#37 opened about 1 year ago by

mstachow

Group size is 128 or 1 for main branch?

8

#36 opened about 1 year ago by

brendanlui

Error when running pipe: temp_state buffer is too small

#35 opened about 1 year ago by

StefanStroescu

Performance Drop due to quantization?

4

#34 opened about 1 year ago by

Teja-Gollapudi

What GPU and RAM is needed for Llama-2-70B-chat-70B（int 8 or fp16） ?

#33 opened about 1 year ago by

yanmengxiang666

in-context learing in LLama2,thanks!

#32 opened about 1 year ago by

yanmengxiang666

How to set max_split_size_mb?

#30 opened over 1 year ago by

neo-benjamin

max_position_embeddings = 2048?

#29 opened over 1 year ago by

zzzac

Load into 2 GPUs

#28 opened over 1 year ago by

sauravm8

Load model into TGI

#27 opened over 1 year ago by

schauppi

RuntimeError: shape '[4, 226, 24576]' is invalid for input of size 9256960

4

#26 opened over 1 year ago by

linkai-dl

Why the input prompt is part of the output?

#25 opened over 1 year ago by

neo-benjamin

What does it mean by the inject_fused_attention disabled for 70B model?

#24 opened over 1 year ago by

neo-benjamin

Generating nonsense output and then broke

#23 opened over 1 year ago by

joycejiang

Perplexity

#22 opened over 1 year ago by

gsaivinay

70TB with multiple A5000

6

#21 opened over 1 year ago by

nashid

error: unexpected keyword argument 'inject_fused_attention'

#19 opened over 1 year ago by

lasalH

Inference error, tensor shapes.

8

#18 opened over 1 year ago by

alejandrofdz

update to latest transformers and exllama, still loading fail

#17 opened over 1 year ago by

yiouyou

llama.cpp just added GQA and full support for 70B LLaMA-2

#16 opened over 1 year ago by

igzbar

Inference time with TGI

#15 opened over 1 year ago by

jacktenyx

Can't launch with TGI

6

#14 opened over 1 year ago by

yekta

output is merely copy of input for 70b @ webui

#13 opened over 1 year ago by

wholehope

Error encountered: CUDA extension not installed while running.

#12 opened over 1 year ago by

wempoo

can u show the settings for quantizing the model?

8

#11 opened over 1 year ago by

hugginglaoda

ValueError: not enough values to unpack (expected 3, got 2)

#10 opened over 1 year ago by

Esin

Further update with slight improvements to the prompt template, also removed the system message

#9 opened over 1 year ago by

clayp

Bloke - add 70B ggml version please

4

#8 opened over 1 year ago by

mirek190

ExLlama is not working, received "shape '[1, 64, 64, 128]' is invalid for input of size 65536" error

#6 opened over 1 year ago by

charleyzhuyi

text-generation-inference error

7

#5 opened over 1 year ago by

msteele

Output always 0 tokens

11

#4 opened over 1 year ago by

sterogn

What GPU is needed for this 70B one?

27

#2 opened over 1 year ago by

RageshAntony

It doesn't work with Exllama at the moment