limcheekin's picture
feat: updated model to q5_1 as q8_0 is too slow.
5041f48
<!DOCTYPE html>
<html>
<head>
<title>ToolBench-ToolLLaMA-2-7b-GGML (q5_1)</title>
</head>
<body>
<h1>ToolBench-ToolLLaMA-2-7b-GGML (q5_1)</h1>
<p>
With the utilization of the
<a href="https://github.com/abetlen/llama-cpp-python">llama-cpp-python</a>
package, we are excited to introduce the GGML model hosted in the Hugging
Face Docker Spaces, made accessible through an OpenAI-compatible API. This
space includes comprehensive API documentation to facilitate seamless
integration.
</p>
<ul>
<li>
The API endpoint:
<a href="https://limcheekin-toolbench-toolllama-2-7b-ggml.hf.space/v1"
>https://limcheekin-toolbench-toolllama-2-7b-ggml.hf.space/v1</a
>
</li>
<li>
The API doc:
<a href="https://limcheekin-toolbench-toolllama-2-7b-ggml.hf.space/docs"
>https://limcheekin-toolbench-toolllama-2-7b-ggml.hf.space/docs</a
>
</li>
</ul>
<p>
If you find this resource valuable, your support in the form of starring
the space would be greatly appreciated. Your engagement plays a vital role
in furthering the application for a community GPU grant, ultimately
enhancing the capabilities and accessibility of this space.
</p>
</body>
</html>