"running in exui for speed at long context" in text-generation-webui
#1
by
kopal37
- opened
How, exactly?
Thanks for your work on this!
Thank the trainers of the constituent models!
Exui is a text generation GUI from the exllamav2 dev. Its quite fast: https://github.com/turboderp/exui
Load the model with 8-bit cache. This applies to ooba as well, if you use that instead. Use MinP with other options disabled, except for temperature and repetition.
I use the model in notebook mode, but you may have to manually adjust the prompt template for chat mode.
I actually wrote this up on reddit, if you are still interested: https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/