Spaces:
Running
How do I reproduce the coherency locally?
I am trying to reproduce the length and quality of the responses locally but do not get anywhere close with this model.
So i'd like to know what kind of things are being done behind the scenes in order to get this level of quality from the reply.
- Which settings are being used for inference? Things like temperature, repetition penalty, etc.
- Is any special steering or prompt injection being done to increase the output quality of the model?
In the interest of the open ecosystem i'd also like to suggest a feature that HuggingChat has an option to show these things to the user for other developers to learn from since I understood the plan is to expand to different models.
Hi @Henk717 , the source code is available here: https://huggingface.co/spaces/huggingchat/chat-ui/tree/main
The preprompt is here: https://huggingface.co/spaces/huggingchat/chat-ui/blob/main/.env#L16
The temperature / etc are here: https://huggingface.co/spaces/huggingchat/chat-ui/blob/main/src/routes/conversation/%5Bid%5D/%2Bpage.svelte#L36
temperature: 0.9,
top_p: 0.95,
repetition_penalty: 1.2,
top_k: 50,
The prompt is built here: https://huggingface.co/spaces/huggingchat/chat-ui/blob/main/src/lib/buildPrompt.ts