Using embeddings to do sentence similarity
Has anyone used the embeddings to calculate sentence similarity like the example card? If so, what are the steps you took to do this?
This is actually a straight forward task, thanks to huggingface/sentence transformers utilities.
We just need to compare the embeddings using a similarity score utility.
Step 1: Encode the sentences to be compared
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings1 = model.encode(sentences1, convert_to_tensor=True)
embeddings2 = model.encode(sentences2, convert_to_tensor=True)
(where, sentencs1 and sentences2 are list of sentences(strings))
Step 2: Compute the similarity using a similarity matrix
(cosine similarity or dot product)
from sentence_transformers import util
cosine_scores = util.cos_sim(embeddings1, embeddings2)
Step 3: Output the pairs with their score
for i in range(len(sentences1)): print("{} \t\t {} \t\t Score: {:.4f}".format(sentences1[i], sentences2[i], cosine_scores[i][i]))
For more references, you can visit Sentence-Transformers website:
https://www.sbert.net/docs/usage/semantic_textual_similarity.html
hi