Spaces:
Sleeping
Sleeping
News
[2023/09] The newest
llama2-wrapper>=0.1.14
supports llama.cpp'sgguf
models.[2023/08] 🔥 For developers, we offer a web server that acts as a drop-in replacement for the OpenAI API.
Usage:
python3 -m llama2_wrapper.server
[2023/08] 🔥 For developers, we released
llama2-wrapper
as a llama2 backend wrapper in PYPI.Install:
pip install llama2-wrapper
Usage:
from llama2_wrapper import LLAMA2_WRAPPER, get_prompt llama2_wrapper = LLAMA2_WRAPPER( model_path="./models/Llama-2-7B-Chat-GGML/llama-2-7b-chat.ggmlv3.q4_0.bin", backend_type="llama.cpp", #options: llama.cpp, transformers, gptq ) prompt = "Do you know Pytorch" llama2_promt = get_prompt(prompt) answer = llama2_wrapper(llama2_promt, temperature=0.9)
[2023/08] 🔥 We added
benchmark.py
for users to benchmark llama2 models on their local devices.- Check/contribute the performance of your device in the full performance doc.
[2023/07] We released llama2-webui, a gradio web UI to run Llama 2 on GPU or CPU from anywhere (Linux/Windows/Mac).
- Supporting models: Llama-2-7b/13b/70b, all Llama-2-GPTQ, all Llama-2-GGML ...
- Supporting model backends: tranformers, bitsandbytes(8-bit inference), AutoGPTQ(4-bit inference), llama.cpp