"running in exui for speed at long context" in text-generation-webui

by kopal37 - opened Nov 30, 2023

Discussion

kopal37

Nov 30, 2023

How, exactly?

Thanks for your work on this!

brucethemoose

Owner Nov 30, 2023

•

edited Nov 30, 2023

Thank the trainers of the constituent models!

Exui is a text generation GUI from the exllamav2 dev. Its quite fast: https://github.com/turboderp/exui

Load the model with 8-bit cache. This applies to ooba as well, if you use that instead. Use MinP with other options disabled, except for temperature and repetition.

I use the model in notebook mode, but you may have to manually adjust the prompt template for chat mode.

brucethemoose

Owner Dec 6, 2023

I actually wrote this up on reddit, if you are still interested: https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment