juanluisdb
/

MiniLM-L-6-rerank-m3

@@ -45,37 +45,37 @@ scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Que
 ### BEIR (NDCG@10)
 I've run tests on different BEIR datasets. Cross Encoders rerank top100 BM25 results.
-|                |   bm25 |   jina-reranker-v1-turbo-en | bge-reranker-v2-m3   | mxbai-rerank-base-v1   |   ms-marco-MiniLM-L-6-v2 |   MiniLM-L-6-rerank-refreshed-ablated | MiniLM-L-6-rerank-refreshed   |
-|:---------------|-------:|----------------------------:|:---------------------|:-----------------------|-------------------------:|--------------------------------------:|:------------------------------|
-| nq             |  0.305 |                       0.533 | **0.597**            | 0.535                  |                    0.523 |                                 0.541 | 0.580                         |
-| fever          |  0.638 |                       0.852 | 0.857                | 0.767                  |                    0.801 |                                 0.822 | **0.867**                     |
-| fiqa           |  0.238 |                       0.336 | **0.397**            | 0.382                  |                    0.349 |                                 0.36  | 0.364                         |
-| trec-covid     |  0.589 |                       0.774 | 0.784                | **0.830**              |                    0.741 |                                 0.733 | 0.738                         |
-| scidocs        |  0.15  |                       0.166 | 0.169                | **0.171**              |                    0.164 |                                 0.163 | 0.165                         |
-| scifact        |  0.676 |                       0.739 | 0.731                | 0.719                  |                    0.688 |                                 0.738 | **0.750**                     |
-| nfcorpus       |  0.318 |                       0.353 | 0.336                | **0.353**              |                    0.349 |                                 0.35  | 0.350                         |
-| hotpotqa       |  0.629 |                       0.745 | **0.794**            | 0.668                  |                    0.724 |                                 0.758 | 0.775                         |
-| dbpedia-entity |  0.319 |                       0.421 | **0.445**            | 0.416                  |                    0.445 |                                 0.438 | 0.444                         |
-| quora          |  0.787 |                       0.858 | 0.858                | 0.747                  |                    0.825 |                                 0.862 | **0.871**                     |
-| climate-fever  |  0.163 |                       0.233 | **0.314**            | 0.253                  |                    0.244 |                                 0.245 | 0.309                         |
-|                           | nq*        | fever*     | fiqa      | trec-covid   | scidocs   | scifact   | nfcorpus   | hotpotqa   | dbpedia-entity   | quora     | climate-fever   |
-|:--------------------------|:----------|:----------|:----------|:-------------|:----------|:----------|:-----------|:-----------|:-----------------|:----------|:----------------|
-| bm25                      | 0.305     | 0.638     | 0.238     | 0.589        | 0.150     | 0.676     | 0.318      | 0.629      | 0.319            | 0.787     | 0.163           |
-| jina-reranker-v1-turbo-en | 0.533     | 0.852     | 0.336     | 0.774        | 0.166     | 0.739     | 0.353      | 0.745      | 0.421            | 0.858     | 0.233           |
-| bge-reranker-v2-m3        | **0.597** | 0.857     | **0.397** | 0.784        | 0.169     | 0.731     | 0.336      | **0.794**  | **0.445**        | 0.858     | **0.314**       |
-| mxbai-rerank-base-v1      | 0.535     | 0.767     | 0.382     | **0.830**    | **0.171** | 0.719     | **0.353**  | 0.668      | 0.416            | 0.747     | 0.253           |
-| ms-marco-MiniLM-L-6-v2    | 0.523     | 0.801     | 0.349     | 0.741        | 0.164     | 0.688     | 0.349      | 0.724      | 0.445            | 0.825     | 0.244           |
-| MiniLM-L-6-rerank-reborn      | 0.580     | **0.867** | 0.364     | 0.738        | 0.165     | **0.750** | 0.350      | 0.775      | 0.444            | **0.871** | 0.309           |
 \* Training splits of NQ and Fever were used as part of the training data.
 Comparison with [ablated model](https://huggingface.co/juanluisdb/MiniLM-L-6-rerank-reborn-ablated/settings) trained only on MSMarco:
-|                                     |     nq |   fever |   fiqa |   trec-covid |   scidocs |   scifact |   nfcorpus |   hotpotqa |   dbpedia-entity |   quora |   climate-fever |
-|:------------------------------------|-------:|--------:|-------:|-------------:|----------:|----------:|-----------:|-----------:|-----------------:|--------:|----------------:|
-| ms-marco-MiniLM-L-6-v2              | 0.5234 |  0.8007 | 0.349  |       0.741  |    0.1638 |    0.688  |     0.3493 |     0.7235 |           0.4445 |  0.8251 |          0.2438 |
-| MiniLM-L-6-rerank-refreshed-ablated | 0.5412 |  0.8221 | 0.3598 |       0.7331 |    0.163  |    0.7376 |     0.3495 |     0.7583 |           0.4382 |  0.8619 |          0.2449 |
-| improvement (%)                     | **3.40** |  **2.67** | **3.08** |      -1.07 |   -0.47 |    **7.22**  |     0.08 |     **4.80** |          -1.41 |  **4.45** |          **0.47** |
 # Datasets Used

 ### BEIR (NDCG@10)
 I've run tests on different BEIR datasets. Cross Encoders rerank top100 BM25 results.
+|                |   bm25 |   jina-reranker-v1-turbo-en | bge-reranker-v2-m3   | mxbai-rerank-base-v1   |   ms-marco-MiniLM-L-6-v2 | MiniLM-L-6-rerank-refreshed   |
+|:---------------|-------:|----------------------------:|:---------------------|:-----------------------|-------------------------:|:------------------------------|
+| nq*             |  0.305 |                       0.533 | **0.597**            | 0.535                  |                    0.523 | 0.580                         |
+| fever*         |  0.638 |                       0.852 | 0.857                | 0.767                  |                    0.801 | **0.867**                     |
+| fiqa           |  0.238 |                       0.336 | **0.397**            | 0.382                  |                    0.349 | 0.364                         |
+| trec-covid     |  0.589 |                       0.774 | 0.784                | **0.830**              |                    0.741 | 0.738                         |
+| scidocs        |  0.15  |                       0.166 | 0.169                | **0.171**              |                    0.164 | 0.165                         |
+| scifact        |  0.676 |                       0.739 | 0.731                | 0.719                  |                    0.688 | **0.750**                     |
+| nfcorpus       |  0.318 |                       0.353 | 0.336                | **0.353**              |                    0.349 | 0.350                         |
+| hotpotqa       |  0.629 |                       0.745 | **0.794**            | 0.668                  |                    0.724 | 0.775                         |
+| dbpedia-entity |  0.319 |                       0.421 | **0.445**            | 0.416                  |                    0.445 | 0.444                         |
+| quora          |  0.787 |                       0.858 | 0.858                | 0.747                  |                    0.825 | **0.871**                     |
+| climate-fever  |  0.163 |                       0.233 | **0.314**            | 0.253                  |                    0.244 | 0.309                         |
 \* Training splits of NQ and Fever were used as part of the training data.
 Comparison with [ablated model](https://huggingface.co/juanluisdb/MiniLM-L-6-rerank-reborn-ablated/settings) trained only on MSMarco:
+|                |   ms-marco-MiniLM-L-6-v2 |   MiniLM-L-6-rerank-refreshed-ablated |
+|:---------------|-------------------------:|--------------------------------------:|
+| nq             |                   0.5234 |                                **0.5412** |
+| fever          |                   0.8007 |                                **0.8221** |
+| fiqa           |                   0.349  |                                **0.3598** |
+| trec-covid     |                   **0.741**  |                                0.7331 |
+| scidocs        |                   **0.1638** |                                0.163  |
+| scifact        |                   0.688  |                                **0.7376** |
+| nfcorpus       |                   0.3493 |                                **0.3495** |
+| hotpotqa       |                   0.7235 |                                **0.7583** |
+| dbpedia-entity |                   **0.4445** |                                0.4382 |
+| quora          |                   0.8251 |                                **0.8619** |
+| climate-fever  |                   0.2438 |                                **0.2449** |
 # Datasets Used