Jared Van Bortel
add warning to README
84e6b93
|
raw
history blame
4.67 kB
metadata
base_model: nomic-ai/nomic-embed-text-v1
inference: false
language:
  - en
license: apache-2.0
model_creator: Nomic
model_name: nomic-embed-text-v1
model_type: bert
pipeline_tag: sentence-similarity
quantized_by: Nomic
tags:
  - feature-extraction
  - sentence-similarity

Warning: There is a llama.cpp PR about to be merged that will break compatibility with these files. Keep an eye out for updates to this repo.



nomic-embed-text-v1 - GGUF

Original model: nomic-embed-text-v1

Description

This repo contains llama.cpp-compatible files for nomic-embed-text-v1 in GGUF format.

llama.cpp will default to 2048 tokens of context with these files. To use the full 8192 tokens that Nomic Embed is benchmarked on, you will have to choose a context extension method. The original model uses Dynamic NTK-Aware RoPE scaling, but that is not currently available in llama.cpp. A combination of YaRN and linear scaling is an acceptable substitute.

These files were converted and quantized with llama.cpp commit 6c00a0669.

Example llama.cpp Command

Compute a single embedding:

./embedding -ngl 99 -m nomic-embed-text-v1.f16.gguf -c 8192 -b 8192 --rope-scaling yarn --rope-freq-scale .75 -p 'search_query: What is TSNE?'

You can also submit a batch of texts to embed, as long as the total number of tokens does not exceed the context length. Only the first three embeddings are shown by the embedding example.

texts.txt:

search_query: What is TSNE?
search_query: Who is Laurens Van der Maaten?

Compute multiple embeddings:

./embedding -ngl 99 -m nomic-embed-text-v1.f16.gguf -c 8192 -b 8192 --rope-scaling yarn --rope-freq-scale .75 -f texts.txt

Compatibility

These files are compatible with llama.cpp as commit ea9c8e114 from 2/13/2024.

Provided Files

The below table shows the mean squared error of the embeddings produced by these quantizations of Nomic Embed relative to the Sentence Transformers implementation.

Name Quant Size MSE
nomic-embed-text-v1.Q2_K.gguf Q2_K 48 MiB 2.36e-03
nomic-embed-text-v1.Q3_K_S.gguf Q3_K_S 57 MiB 1.31e-03
nomic-embed-text-v1.Q3_K_M.gguf Q3_K_M 65 MiB 8.73e-04
nomic-embed-text-v1.Q3_K_L.gguf Q3_K_L 69 MiB 8.68e-04
nomic-embed-text-v1.Q4_0.gguf Q4_0 75 MiB 6.87e-04
nomic-embed-text-v1.Q4_K_S.gguf Q4_K_S 75 MiB 6.81e-04
nomic-embed-text-v1.Q4_K_M.gguf Q4_K_M 81 MiB 3.12e-04
nomic-embed-text-v1.Q5_0.gguf Q5_0 91 MiB 2.79e-04
nomic-embed-text-v1.Q5_K_S.gguf Q5_K_S 91 MiB 2.61e-04
nomic-embed-text-v1.Q5_K_M.gguf Q5_K_M 95 MiB 7.34e-05
nomic-embed-text-v1.Q6_K.gguf Q6_K 108 MiB 6.29e-05
nomic-embed-text-v1.Q8_0.gguf Q8_0 140 MiB 6.34e-06
nomic-embed-text-v1.f16.gguf F16 262 MiB 5.62e-10
nomic-embed-text-v1.f32.gguf F32 262 MiB 9.34e-11