Use of `layer_norm` in examples

#35
by Davidg707 - opened

In this example from the model card I'm having trouble working out why F.layer_norm is being used.

import torch.nn.functional as F
from sentence_transformers import SentenceTransformer

matryoshka_dim = 512

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
sentences = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?']
embeddings = model.encode(sentences, convert_to_tensor=True)
embeddings = F.layer_norm(embeddings, normalized_shape=(embeddings.shape[1],))
embeddings = embeddings[:, :matryoshka_dim]
embeddings = F.normalize(embeddings, p=2, dim=1)

It seems unusual, is this a mistake, or is there something I'm not understanding?

Or in other words, what's wrong with this:

embeddings = model.encode(sentences, convert_to_tensor=True)
embeddings = F.normalize(embeddings[:, :matryoshka_dim])  # limit dims and normalize
Nomic AI org

You're right that it's non-standard! we used it to train our model to be binary-aware, inspired by this tweet/paper. We messed around with this during a hack week and found it worked fairly well and was simpler than using a STE

zpn changed discussion status to closed

Ah so this is specific to the classification case. If I just want to use (and truncate) the embeddings for similarity search, I assume I don't need the layer_norm step.

I compared the distributions of values with and without the layer_norm step and they're close to identical (since the values out of the model are already mean close to 0 and std near-ish 1)

image.png

Sign up or log in to comment