TriLM - llamafile

This is a 1.58 bit ternary LLM whose weights consist of {-1, 0, +1}. It's highly optimized for CPU performance, thanks to the Q2_K_S quantization format.

This repository packages and distributes TriLM as executable weights, which we call llamafiles. The files you download here will run on Linux, MacOS, Windows, FreeBSD, OpenBSD, and NetBSD for AMD64 and ARM64.

Quickstart

Running the following on a desktop OS will launch a tab in your web browser with a completions interface.

wget https://huggingface.co/Mozilla/TriLM-llamafile/resolve/main/TriLM_3.9B.llamafile
chmod +x TriLM_3.9B.llamafile
./TriLM_3.9B.llamafile

You can also use the command line interface:

./TriLM_3.9B.llamafile -p "this is my prompt"

For further information, please see the llamafile README.

Having trouble? See the "Gotchas" section of the README.

Prompting

This is a base model. It hasn't been fine-tuned for chat. It's recommended that the completions interface be used.

It's recommended with the smaller TriLM models (e.g. 99M) that a high repeat penalty be set, e.g. --repeat-penalty 10. When using the CLI mode, this flag is specified by default in the .args file embedded within the llamafiles from this repo.

Benchmarks

cpu_info model_filename size test t/s
AMD Ryzen Threadripper PRO 7995WX (znver4) TriLM_3.9B.llamafile 1.31 GiB pp512 1069.54
AMD Ryzen Threadripper PRO 7995WX (znver4) TriLM_3.9B.llamafile 1.31 GiB tg16 88.47
AMD Ryzen Threadripper PRO 7995WX (znver4) TriLM_2.4B.llamafile 837.02 MiB pp512 1441.04
AMD Ryzen Threadripper PRO 7995WX (znver4) TriLM_2.4B.llamafile 837.02 MiB tg16 110.80
AMD Ryzen Threadripper PRO 7995WX (znver4) TriLM_1.5B.llamafile 531.44 MiB pp512 2185.94
AMD Ryzen Threadripper PRO 7995WX (znver4) TriLM_1.5B.llamafile 531.44 MiB tg16 154.59
AMD Ryzen Threadripper PRO 7995WX (znver4) TriLM_1.1B.llamafile 408.66 MiB pp512 2692.87
AMD Ryzen Threadripper PRO 7995WX (znver4) TriLM_1.1B.llamafile 408.66 MiB tg16 173.08
AMD Ryzen Threadripper PRO 7995WX (znver4) TriLM_830M.llamafile 301.76 MiB pp512 3353.51
AMD Ryzen Threadripper PRO 7995WX (znver4) TriLM_830M.llamafile 301.76 MiB tg16 191.98
AMD Ryzen Threadripper PRO 7995WX (znver4) TriLM_560M.llamafile 211.21 MiB pp512 4297.08
AMD Ryzen Threadripper PRO 7995WX (znver4) TriLM_560M.llamafile 211.21 MiB tg16 209.57
AMD Ryzen Threadripper PRO 7995WX (znver4) TriLM_390M.llamafile 148.93 MiB pp512 5130.90
AMD Ryzen Threadripper PRO 7995WX (znver4) TriLM_390M.llamafile 148.93 MiB tg16 221.88
AMD Ryzen Threadripper PRO 7995WX (znver4) TriLM_99M.llamafile 148.93 MiB pp512 5127.00
AMD Ryzen Threadripper PRO 7995WX (znver4) TriLM_99M.llamafile 148.93 MiB tg16 218.93
AMD Ryzen Threadripper PRO 7995WX (znver4) TriLM_190M.llamafile 78.55 MiB pp512 10874.11
AMD Ryzen Threadripper PRO 7995WX (znver4) TriLM_190M.llamafile 78.55 MiB tg16 334.45
Apple M2 Ultra (+fp16+dotprod) TriLM_3.9B.llamafile 1.31 GiB pp512 227.95
Apple M2 Ultra (+fp16+dotprod) TriLM_3.9B.llamafile 1.31 GiB tg16 65.17
Apple M2 Ultra (+fp16+dotprod) TriLM_2.4B.llamafile 837.02 MiB pp512 347.93
Apple M2 Ultra (+fp16+dotprod) TriLM_2.4B.llamafile 837.02 MiB tg16 48.26
Apple M2 Ultra (+fp16+dotprod) TriLM_1.5B.llamafile 531.44 MiB pp512 588.86
Apple M2 Ultra (+fp16+dotprod) TriLM_1.5B.llamafile 531.44 MiB tg16 140.22
Apple M2 Ultra (+fp16+dotprod) TriLM_1.1B.llamafile 408.66 MiB pp512 767.47
Apple M2 Ultra (+fp16+dotprod) TriLM_1.1B.llamafile 408.66 MiB tg16 167.80
Apple M2 Ultra (+fp16+dotprod) TriLM_830M.llamafile 301.76 MiB pp512 1031.20
Apple M2 Ultra (+fp16+dotprod) TriLM_830M.llamafile 301.76 MiB tg16 204.46
Apple M2 Ultra (+fp16+dotprod) TriLM_560M.llamafile 211.21 MiB pp512 1487.29
Apple M2 Ultra (+fp16+dotprod) TriLM_560M.llamafile 211.21 MiB tg16 245.53
Apple M2 Ultra (+fp16+dotprod) TriLM_390M.llamafile 148.93 MiB pp512 2049.02
Apple M2 Ultra (+fp16+dotprod) TriLM_390M.llamafile 148.93 MiB tg16 332.24
Apple M2 Ultra (+fp16+dotprod) TriLM_99M.llamafile 148.93 MiB pp512 2103.34
Apple M2 Ultra (+fp16+dotprod) TriLM_99M.llamafile 148.93 MiB tg16 301.31
Apple M2 Ultra (+fp16+dotprod) TriLM_190M.llamafile 78.55 MiB pp512 4762.49
Apple M2 Ultra (+fp16+dotprod) TriLM_190M.llamafile 78.55 MiB tg16 553.83
Intel Core i9-14900K (alderlake) TriLM_3.9B.llamafile 1.31 GiB pp512 167.15
Intel Core i9-14900K (alderlake) TriLM_3.9B.llamafile 1.31 GiB tg16 53.22
Intel Core i9-14900K (alderlake) TriLM_2.4B.llamafile 837.02 MiB pp512 261.73
Intel Core i9-14900K (alderlake) TriLM_2.4B.llamafile 837.02 MiB tg16 78.39
Intel Core i9-14900K (alderlake) TriLM_1.5B.llamafile 531.44 MiB pp512 426.17
Intel Core i9-14900K (alderlake) TriLM_1.5B.llamafile 531.44 MiB tg16 123.91
Intel Core i9-14900K (alderlake) TriLM_1.1B.llamafile 408.66 MiB pp512 563.58
Intel Core i9-14900K (alderlake) TriLM_1.1B.llamafile 408.66 MiB tg16 159.13
Intel Core i9-14900K (alderlake) TriLM_830M.llamafile 301.76 MiB pp512 763.27
Intel Core i9-14900K (alderlake) TriLM_830M.llamafile 301.76 MiB tg16 209.42
Intel Core i9-14900K (alderlake) TriLM_560M.llamafile 211.21 MiB pp512 1116.30
Intel Core i9-14900K (alderlake) TriLM_560M.llamafile 211.21 MiB tg16 295.71
Intel Core i9-14900K (alderlake) TriLM_390M.llamafile 148.93 MiB pp512 1586.69
Intel Core i9-14900K (alderlake) TriLM_390M.llamafile 148.93 MiB tg16 377.50
Intel Core i9-14900K (alderlake) TriLM_99M.llamafile 148.93 MiB pp512 1587.38
Intel Core i9-14900K (alderlake) TriLM_99M.llamafile 148.93 MiB tg16 401.37
Intel Core i9-14900K (alderlake) TriLM_190M.llamafile 78.55 MiB pp512 3713.16
Intel Core i9-14900K (alderlake) TriLM_190M.llamafile 78.55 MiB tg16 845.54
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) TriLM_3.9B.llamafile 1.31 GiB pp512 17.02
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) TriLM_3.9B.llamafile 1.31 GiB tg16 6.67
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) TriLM_2.4B.llamafile 837.02 MiB pp512 26.35
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) TriLM_2.4B.llamafile 837.02 MiB tg16 10.52
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) TriLM_1.5B.llamafile 531.44 MiB pp512 42.52
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) TriLM_1.5B.llamafile 531.44 MiB tg16 16.91
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) TriLM_1.1B.llamafile 408.66 MiB pp512 56.57
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) TriLM_1.1B.llamafile 408.66 MiB tg16 20.54
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) TriLM_390M.llamafile 148.93 MiB pp512 146.67
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) TriLM_390M.llamafile 148.93 MiB tg16 56.77
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) TriLM_99M.llamafile 148.93 MiB pp512 147.65
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) TriLM_99M.llamafile 148.93 MiB tg16 58.24
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) TriLM_190M.llamafile 78.55 MiB pp512 338.42
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) TriLM_190M.llamafile 78.55 MiB tg16 107.33

About llamafile

llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023. It uses Cosmopolitan Libc to turn LLM weights into runnable llama.cpp binaries that run on the stock installs of six OSes for both ARM64 and AMD64.


TriLM 3.9B Unpacked

TriLM (ternary model), unpacked to FP16 format - compatible with FP16 GEMMs. After unpacking, TriLM has the same architecture as LLaMa.

import transformers as tf, torch
model_name = "SpectraSuite/TriLM_3.9B_Unpacked"

# Please adjust the temperature, repetition penalty, top_k, top_p and other sampling parameters according to your needs.
pipeline = tf.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.float16}, device_map="auto")

# These are base (pretrained) LLMs that are not instruction and chat tuned. You may need to adjust your prompt accordingly.
pipeline("Once upon a time")
Downloads last month
188
Inference Examples
Unable to determine this model's library. Check the docs .