wtf response? :D
server.py --auto-devices --model_type LLaMa --chat --wbits 4 --groupsize 128
Input: hello
Output:
{
"id": 1,
"name": "Joe",
"email": "joe@example.com"
}
and so on...
}
The response is an object with the following structure:
Logs:
Starting the web UI...
bin h:\0_oobabooga\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
INFO:Loading TheBloke_guanaco-13B-GPTQ...
INFO:Found the following quantized model: models\TheBloke_guanaco-13B-GPTQ\Guanaco-13B-GPTQ-4bit-128g.no-act-order.safetensors
INFO:Loaded the model in 6.78 seconds.
INFO:Loading the extension "gallery"...
Running on local URL: http://127.0.0.1:7860
To create a public link, set share=True
in launch()
.
Output generated in 27.31 seconds (2.05 tokens/s, 56 tokens, context 14, seed 1830835794)
Output generated in 6.82 seconds (1.76 tokens/s, 12 tokens, context 86, seed 1347457478)
You need to use a prompt template with these models
In text-gen-ui in the bottom left there's a "Prompt" dropdown box. Choose "Alpaca" and then enter your prompt in the template it provides, eg:
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Hello, how are you?
### Response:
Working!
Thank you ! <3 <3 <3