Using embeddings to do sentence similarity

#16
by bilalmalik4321 - opened

Has anyone used the embeddings to calculate sentence similarity like the example card? If so, what are the steps you took to do this?

This is actually a straight forward task, thanks to huggingface/sentence transformers utilities.
We just need to compare the embeddings using a similarity score utility.

Step 1: Encode the sentences to be compared

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings1 = model.encode(sentences1, convert_to_tensor=True)
embeddings2 = model.encode(sentences2, convert_to_tensor=True)

(where, sentencs1 and sentences2 are list of sentences(strings))

Step 2: Compute the similarity using a similarity matrix

(cosine similarity or dot product)

from sentence_transformers import util
cosine_scores = util.cos_sim(embeddings1, embeddings2)

Step 3: Output the pairs with their score

for i in range(len(sentences1)): print("{} \t\t {} \t\t Score: {:.4f}".format(sentences1[i], sentences2[i], cosine_scores[i][i]))

For more references, you can visit Sentence-Transformers website:
https://www.sbert.net/docs/usage/semantic_textual_similarity.html

Sign up or log in to comment