Loading the model without any webUI
#5
by
MrGobbs
- opened
I wanted to use the model in a python code with pytorch. I did not want to use a web-ui just plain old command terminal. I wanted to know how I can do it.
Llama.cpp or python
I have a little guide for vicuna here you can do it with my models too, the ggml that TheBloke published.
Just import transformers and then get the model the settings you want, or just don't do sampling and then you're good to go.
Can I use this with python using llama ccp?
So would that be correct:LLM = Llama(MODEL, verbose=False, n_ctx=2048)
and have MODEL
replaced with the quantized bin file,
and n_ctx=161984
? Or is there anything else that needs to be done?