Comparison to all-MiniLM-L12-v2

#58
by swax-andres - opened

Hi there,

I will refer as This Model to... well, this model, where I am posting: [sentence-transformers/all-MiniLM-L6-v2]. Other model as: [sentence-transformers/all-MiniLM-L12-v2].

From what I gather, both This Model [sentence-transformers/all-MiniLM-L6-v2] and Other Model [sentence-transformers/all-MiniLM-L12-v2] come from a common ancestor, which I will call Grandparent: [microsoft/MiniLM-L12-H384-uncased].


Constants:
fine_tuning_datasets_ = DS_1....DS_N, as documented in both model cards.
hyper_parameter_set = XYZ, as documented in both model cards.


Lineage:

This Model
⬆ fine tuned using {fine_tuning_datasets and hyper_parameter_set}
nreimers/MiniLM-L6-H384-uncased
⬆ remove every 2nd layer of, no fine tuning
Grandparent.

Other Model
⬆ fine tuned using {fine_tuning_datasets and hyper_parameter_set}
Grandparent.


Summarizing:
This Model comes from fine tuning a version of Grandparent, which has had its number of layers halved.

Other Model come from directly fine tuning Grandparent with the same training set and hyperparameters.


Question:
From a commoneer standpoint, I would see Other Model as more complete than This Model (33.4M vs 22.7M parameters per the Model Cards). So, what I am curious about is which model can be considered the one that generates "better" embeddings? I would define this "better" as more nuanced, capturing a bit more detail of the embedded text - I think...

Worth mentioning for the discussion is that This Model does support TensorFlow whereas Other Model does not. Is that part of the equation of which one is "better" in a sense?

I have been running some basic examples -while learning- on a Jupyter environment (VS Code) and have not seen any important difference in resource usage between This and Other models...

From the amount of comments, questions and overall conversation the the pages of both models, I see This Model as more "popular"? Some guidance would be great.


My current case is more about learning, but eventually similarity search.
Pardon the very long topic and maybe excessive detail, I'm new to all this.

Sign up or log in to comment