--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - mteb model-index: - name: mmlw-e5-base results: - task: type: Clustering dataset: type: PL-MTEB/8tags-clustering name: MTEB 8TagsClustering config: default split: test revision: None metrics: - type: v_measure value: 30.249113010261492 - task: type: Classification dataset: type: PL-MTEB/allegro-reviews name: MTEB AllegroReviews config: default split: test revision: None metrics: - type: accuracy value: 36.3817097415507 - type: f1 value: 32.77742158736663 - task: type: Retrieval dataset: type: arguana-pl name: MTEB ArguAna-PL config: default split: test revision: None metrics: - type: map_at_1 value: 32.646 - type: map_at_10 value: 49.488 - type: map_at_100 value: 50.190999999999995 - type: map_at_1000 value: 50.194 - type: map_at_3 value: 44.749 - type: map_at_5 value: 47.571999999999996 - type: mrr_at_1 value: 34.211000000000006 - type: mrr_at_10 value: 50.112 - type: mrr_at_100 value: 50.836000000000006 - type: mrr_at_1000 value: 50.839 - type: mrr_at_3 value: 45.614 - type: mrr_at_5 value: 48.242000000000004 - type: ndcg_at_1 value: 32.646 - type: ndcg_at_10 value: 58.396 - type: ndcg_at_100 value: 61.285000000000004 - type: ndcg_at_1000 value: 61.358999999999995 - type: ndcg_at_3 value: 48.759 - type: ndcg_at_5 value: 53.807 - type: precision_at_1 value: 32.646 - type: precision_at_10 value: 8.663 - type: precision_at_100 value: 0.9900000000000001 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 20.128 - type: precision_at_5 value: 14.509 - type: recall_at_1 value: 32.646 - type: recall_at_10 value: 86.629 - type: recall_at_100 value: 99.004 - type: recall_at_1000 value: 99.57300000000001 - type: recall_at_3 value: 60.38400000000001 - type: recall_at_5 value: 72.54599999999999 - task: type: Classification dataset: type: PL-MTEB/cbd name: MTEB CBD config: default split: test revision: None metrics: - type: accuracy value: 65.53999999999999 - type: ap value: 19.75395945379771 - type: f1 value: 55.00481388401326 - task: type: PairClassification dataset: type: PL-MTEB/cdsce-pairclassification name: MTEB CDSC-E config: default split: test revision: None metrics: - type: cos_sim_accuracy value: 89.5 - type: cos_sim_ap value: 77.26879308078568 - type: cos_sim_f1 value: 65.13157894736842 - type: cos_sim_precision value: 86.8421052631579 - type: cos_sim_recall value: 52.10526315789473 - type: dot_accuracy value: 88.0 - type: dot_ap value: 69.17235659054914 - type: dot_f1 value: 65.71428571428571 - type: dot_precision value: 71.875 - type: dot_recall value: 60.526315789473685 - type: euclidean_accuracy value: 89.5 - type: euclidean_ap value: 77.1905400565015 - type: euclidean_f1 value: 64.91803278688525 - type: euclidean_precision value: 86.08695652173914 - type: euclidean_recall value: 52.10526315789473 - type: manhattan_accuracy value: 89.5 - type: manhattan_ap value: 77.19531778873724 - type: manhattan_f1 value: 64.72491909385113 - type: manhattan_precision value: 84.03361344537815 - type: manhattan_recall value: 52.63157894736842 - type: max_accuracy value: 89.5 - type: max_ap value: 77.26879308078568 - type: max_f1 value: 65.71428571428571 - task: type: STS dataset: type: PL-MTEB/cdscr-sts name: MTEB CDSC-R config: default split: test revision: None metrics: - type: cos_sim_pearson value: 93.18498922236566 - type: cos_sim_spearman value: 93.26224500108704 - type: euclidean_pearson value: 92.25462061070286 - type: euclidean_spearman value: 93.18623989769242 - type: manhattan_pearson value: 92.16291103586255 - type: manhattan_spearman value: 93.14403078934417 - task: type: Retrieval dataset: type: dbpedia-pl name: MTEB DBPedia-PL config: default split: test revision: None metrics: - type: map_at_1 value: 8.268 - type: map_at_10 value: 17.391000000000002 - type: map_at_100 value: 24.266 - type: map_at_1000 value: 25.844 - type: map_at_3 value: 12.636 - type: map_at_5 value: 14.701 - type: mrr_at_1 value: 62.74999999999999 - type: mrr_at_10 value: 70.25200000000001 - type: mrr_at_100 value: 70.601 - type: mrr_at_1000 value: 70.613 - type: mrr_at_3 value: 68.083 - type: mrr_at_5 value: 69.37100000000001 - type: ndcg_at_1 value: 51.87500000000001 - type: ndcg_at_10 value: 37.185 - type: ndcg_at_100 value: 41.949 - type: ndcg_at_1000 value: 49.523 - type: ndcg_at_3 value: 41.556 - type: ndcg_at_5 value: 39.278 - type: precision_at_1 value: 63.24999999999999 - type: precision_at_10 value: 29.225 - type: precision_at_100 value: 9.745 - type: precision_at_1000 value: 2.046 - type: precision_at_3 value: 43.833 - type: precision_at_5 value: 37.9 - type: recall_at_1 value: 8.268 - type: recall_at_10 value: 22.542 - type: recall_at_100 value: 48.154 - type: recall_at_1000 value: 72.62100000000001 - type: recall_at_3 value: 13.818 - type: recall_at_5 value: 17.137 - task: type: Retrieval dataset: type: fiqa-pl name: MTEB FiQA-PL config: default split: test revision: None metrics: - type: map_at_1 value: 16.489 - type: map_at_10 value: 26.916 - type: map_at_100 value: 28.582 - type: map_at_1000 value: 28.774 - type: map_at_3 value: 23.048 - type: map_at_5 value: 24.977 - type: mrr_at_1 value: 33.642 - type: mrr_at_10 value: 41.987 - type: mrr_at_100 value: 42.882 - type: mrr_at_1000 value: 42.93 - type: mrr_at_3 value: 39.48 - type: mrr_at_5 value: 40.923 - type: ndcg_at_1 value: 33.488 - type: ndcg_at_10 value: 34.528 - type: ndcg_at_100 value: 41.085 - type: ndcg_at_1000 value: 44.474000000000004 - type: ndcg_at_3 value: 30.469 - type: ndcg_at_5 value: 31.618000000000002 - type: precision_at_1 value: 33.488 - type: precision_at_10 value: 9.783999999999999 - type: precision_at_100 value: 1.6389999999999998 - type: precision_at_1000 value: 0.22699999999999998 - type: precision_at_3 value: 20.525 - type: precision_at_5 value: 15.093 - type: recall_at_1 value: 16.489 - type: recall_at_10 value: 42.370000000000005 - type: recall_at_100 value: 67.183 - type: recall_at_1000 value: 87.211 - type: recall_at_3 value: 27.689999999999998 - type: recall_at_5 value: 33.408 - task: type: Retrieval dataset: type: hotpotqa-pl name: MTEB HotpotQA-PL config: default split: test revision: None metrics: - type: map_at_1 value: 37.373 - type: map_at_10 value: 57.509 - type: map_at_100 value: 58.451 - type: map_at_1000 value: 58.524 - type: map_at_3 value: 54.064 - type: map_at_5 value: 56.257999999999996 - type: mrr_at_1 value: 74.895 - type: mrr_at_10 value: 81.233 - type: mrr_at_100 value: 81.461 - type: mrr_at_1000 value: 81.47 - type: mrr_at_3 value: 80.124 - type: mrr_at_5 value: 80.862 - type: ndcg_at_1 value: 74.747 - type: ndcg_at_10 value: 66.249 - type: ndcg_at_100 value: 69.513 - type: ndcg_at_1000 value: 70.896 - type: ndcg_at_3 value: 61.312 - type: ndcg_at_5 value: 64.132 - type: precision_at_1 value: 74.747 - type: precision_at_10 value: 13.873 - type: precision_at_100 value: 1.641 - type: precision_at_1000 value: 0.182 - type: precision_at_3 value: 38.987 - type: precision_at_5 value: 25.621 - type: recall_at_1 value: 37.373 - type: recall_at_10 value: 69.365 - type: recall_at_100 value: 82.039 - type: recall_at_1000 value: 91.148 - type: recall_at_3 value: 58.48100000000001 - type: recall_at_5 value: 64.051 - task: type: Retrieval dataset: type: msmarco-pl name: MTEB MSMARCO-PL config: default split: validation revision: None metrics: - type: map_at_1 value: 16.753999999999998 - type: map_at_10 value: 26.764 - type: map_at_100 value: 27.929 - type: map_at_1000 value: 27.994999999999997 - type: map_at_3 value: 23.527 - type: map_at_5 value: 25.343 - type: mrr_at_1 value: 17.192 - type: mrr_at_10 value: 27.141 - type: mrr_at_100 value: 28.269 - type: mrr_at_1000 value: 28.327999999999996 - type: mrr_at_3 value: 23.906 - type: mrr_at_5 value: 25.759999999999998 - type: ndcg_at_1 value: 17.177999999999997 - type: ndcg_at_10 value: 32.539 - type: ndcg_at_100 value: 38.383 - type: ndcg_at_1000 value: 40.132 - type: ndcg_at_3 value: 25.884 - type: ndcg_at_5 value: 29.15 - type: precision_at_1 value: 17.177999999999997 - type: precision_at_10 value: 5.268 - type: precision_at_100 value: 0.823 - type: precision_at_1000 value: 0.097 - type: precision_at_3 value: 11.122 - type: precision_at_5 value: 8.338 - type: recall_at_1 value: 16.753999999999998 - type: recall_at_10 value: 50.388 - type: recall_at_100 value: 77.86999999999999 - type: recall_at_1000 value: 91.55 - type: recall_at_3 value: 32.186 - type: recall_at_5 value: 40.048 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pl) config: pl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 70.9280430396772 - type: f1 value: 68.7099581466286 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pl) config: pl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.76126429051783 - type: f1 value: 74.72274307018111 - task: type: Retrieval dataset: type: nfcorpus-pl name: MTEB NFCorpus-PL config: default split: test revision: None metrics: - type: map_at_1 value: 5.348 - type: map_at_10 value: 12.277000000000001 - type: map_at_100 value: 15.804000000000002 - type: map_at_1000 value: 17.277 - type: map_at_3 value: 8.783000000000001 - type: map_at_5 value: 10.314 - type: mrr_at_1 value: 43.963 - type: mrr_at_10 value: 52.459999999999994 - type: mrr_at_100 value: 53.233 - type: mrr_at_1000 value: 53.26499999999999 - type: mrr_at_3 value: 50.464 - type: mrr_at_5 value: 51.548 - type: ndcg_at_1 value: 40.711999999999996 - type: ndcg_at_10 value: 33.709 - type: ndcg_at_100 value: 31.398 - type: ndcg_at_1000 value: 40.042 - type: ndcg_at_3 value: 37.85 - type: ndcg_at_5 value: 36.260999999999996 - type: precision_at_1 value: 43.344 - type: precision_at_10 value: 25.851000000000003 - type: precision_at_100 value: 8.279 - type: precision_at_1000 value: 2.085 - type: precision_at_3 value: 36.326 - type: precision_at_5 value: 32.074000000000005 - type: recall_at_1 value: 5.348 - type: recall_at_10 value: 16.441 - type: recall_at_100 value: 32.975 - type: recall_at_1000 value: 64.357 - type: recall_at_3 value: 9.841999999999999 - type: recall_at_5 value: 12.463000000000001 - task: type: Retrieval dataset: type: nq-pl name: MTEB NQ-PL config: default split: test revision: None metrics: - type: map_at_1 value: 24.674 - type: map_at_10 value: 37.672 - type: map_at_100 value: 38.767 - type: map_at_1000 value: 38.82 - type: map_at_3 value: 33.823 - type: map_at_5 value: 36.063 - type: mrr_at_1 value: 27.839000000000002 - type: mrr_at_10 value: 40.129 - type: mrr_at_100 value: 41.008 - type: mrr_at_1000 value: 41.048 - type: mrr_at_3 value: 36.718 - type: mrr_at_5 value: 38.841 - type: ndcg_at_1 value: 27.839000000000002 - type: ndcg_at_10 value: 44.604 - type: ndcg_at_100 value: 49.51 - type: ndcg_at_1000 value: 50.841 - type: ndcg_at_3 value: 37.223 - type: ndcg_at_5 value: 41.073 - type: precision_at_1 value: 27.839000000000002 - type: precision_at_10 value: 7.5 - type: precision_at_100 value: 1.03 - type: precision_at_1000 value: 0.116 - type: precision_at_3 value: 17.005 - type: precision_at_5 value: 12.399000000000001 - type: recall_at_1 value: 24.674 - type: recall_at_10 value: 63.32299999999999 - type: recall_at_100 value: 85.088 - type: recall_at_1000 value: 95.143 - type: recall_at_3 value: 44.157999999999994 - type: recall_at_5 value: 53.054 - task: type: Classification dataset: type: laugustyniak/abusive-clauses-pl name: MTEB PAC config: default split: test revision: None metrics: - type: accuracy value: 64.5033304373009 - type: ap value: 75.81507275237081 - type: f1 value: 62.24617820785985 - task: type: PairClassification dataset: type: PL-MTEB/ppc-pairclassification name: MTEB PPC config: default split: test revision: None metrics: - type: cos_sim_accuracy value: 85.39999999999999 - type: cos_sim_ap value: 91.75881977787009 - type: cos_sim_f1 value: 87.79264214046823 - type: cos_sim_precision value: 88.68243243243244 - type: cos_sim_recall value: 86.9205298013245 - type: dot_accuracy value: 71.0 - type: dot_ap value: 82.97829049033108 - type: dot_f1 value: 78.77055039313797 - type: dot_precision value: 69.30817610062893 - type: dot_recall value: 91.22516556291392 - type: euclidean_accuracy value: 85.2 - type: euclidean_ap value: 91.85245521151309 - type: euclidean_f1 value: 87.64607679465777 - type: euclidean_precision value: 88.38383838383838 - type: euclidean_recall value: 86.9205298013245 - type: manhattan_accuracy value: 85.39999999999999 - type: manhattan_ap value: 91.85497100160649 - type: manhattan_f1 value: 87.77219430485762 - type: manhattan_precision value: 88.8135593220339 - type: manhattan_recall value: 86.75496688741721 - type: max_accuracy value: 85.39999999999999 - type: max_ap value: 91.85497100160649 - type: max_f1 value: 87.79264214046823 - task: type: PairClassification dataset: type: PL-MTEB/psc-pairclassification name: MTEB PSC config: default split: test revision: None metrics: - type: cos_sim_accuracy value: 97.58812615955473 - type: cos_sim_ap value: 99.14945370088302 - type: cos_sim_f1 value: 96.06060606060606 - type: cos_sim_precision value: 95.48192771084338 - type: cos_sim_recall value: 96.64634146341463 - type: dot_accuracy value: 95.17625231910947 - type: dot_ap value: 97.05592933601112 - type: dot_f1 value: 92.14501510574019 - type: dot_precision value: 91.31736526946108 - type: dot_recall value: 92.98780487804879 - type: euclidean_accuracy value: 97.6808905380334 - type: euclidean_ap value: 99.18538119402824 - type: euclidean_f1 value: 96.20637329286798 - type: euclidean_precision value: 95.77039274924472 - type: euclidean_recall value: 96.64634146341463 - type: manhattan_accuracy value: 97.58812615955473 - type: manhattan_ap value: 99.17870990853292 - type: manhattan_f1 value: 96.02446483180427 - type: manhattan_precision value: 96.31901840490798 - type: manhattan_recall value: 95.73170731707317 - type: max_accuracy value: 97.6808905380334 - type: max_ap value: 99.18538119402824 - type: max_f1 value: 96.20637329286798 - task: type: Classification dataset: type: PL-MTEB/polemo2_in name: MTEB PolEmo2.0-IN config: default split: test revision: None metrics: - type: accuracy value: 68.69806094182825 - type: f1 value: 68.0619984307764 - task: type: Classification dataset: type: PL-MTEB/polemo2_out name: MTEB PolEmo2.0-OUT config: default split: test revision: None metrics: - type: accuracy value: 35.80971659919028 - type: f1 value: 31.13081621324864 - task: type: Retrieval dataset: type: quora-pl name: MTEB Quora-PL config: default split: test revision: None metrics: - type: map_at_1 value: 66.149 - type: map_at_10 value: 80.133 - type: map_at_100 value: 80.845 - type: map_at_1000 value: 80.866 - type: map_at_3 value: 76.983 - type: map_at_5 value: 78.938 - type: mrr_at_1 value: 76.09 - type: mrr_at_10 value: 83.25099999999999 - type: mrr_at_100 value: 83.422 - type: mrr_at_1000 value: 83.42500000000001 - type: mrr_at_3 value: 82.02199999999999 - type: mrr_at_5 value: 82.831 - type: ndcg_at_1 value: 76.14999999999999 - type: ndcg_at_10 value: 84.438 - type: ndcg_at_100 value: 86.048 - type: ndcg_at_1000 value: 86.226 - type: ndcg_at_3 value: 80.97999999999999 - type: ndcg_at_5 value: 82.856 - type: precision_at_1 value: 76.14999999999999 - type: precision_at_10 value: 12.985 - type: precision_at_100 value: 1.513 - type: precision_at_1000 value: 0.156 - type: precision_at_3 value: 35.563 - type: precision_at_5 value: 23.586 - type: recall_at_1 value: 66.149 - type: recall_at_10 value: 93.195 - type: recall_at_100 value: 98.924 - type: recall_at_1000 value: 99.885 - type: recall_at_3 value: 83.439 - type: recall_at_5 value: 88.575 - task: type: Retrieval dataset: type: scidocs-pl name: MTEB SCIDOCS-PL config: default split: test revision: None metrics: - type: map_at_1 value: 3.688 - type: map_at_10 value: 10.23 - type: map_at_100 value: 12.077 - type: map_at_1000 value: 12.382 - type: map_at_3 value: 7.149 - type: map_at_5 value: 8.689 - type: mrr_at_1 value: 18.2 - type: mrr_at_10 value: 28.816999999999997 - type: mrr_at_100 value: 29.982 - type: mrr_at_1000 value: 30.058 - type: mrr_at_3 value: 25.983 - type: mrr_at_5 value: 27.418 - type: ndcg_at_1 value: 18.2 - type: ndcg_at_10 value: 17.352999999999998 - type: ndcg_at_100 value: 24.859 - type: ndcg_at_1000 value: 30.535 - type: ndcg_at_3 value: 16.17 - type: ndcg_at_5 value: 14.235000000000001 - type: precision_at_1 value: 18.2 - type: precision_at_10 value: 9.19 - type: precision_at_100 value: 2.01 - type: precision_at_1000 value: 0.338 - type: precision_at_3 value: 15.5 - type: precision_at_5 value: 12.78 - type: recall_at_1 value: 3.688 - type: recall_at_10 value: 18.632 - type: recall_at_100 value: 40.822 - type: recall_at_1000 value: 68.552 - type: recall_at_3 value: 9.423 - type: recall_at_5 value: 12.943 - task: type: PairClassification dataset: type: PL-MTEB/sicke-pl-pairclassification name: MTEB SICK-E-PL config: default split: test revision: None metrics: - type: cos_sim_accuracy value: 83.12270688952303 - type: cos_sim_ap value: 76.4528312253856 - type: cos_sim_f1 value: 68.69627507163324 - type: cos_sim_precision value: 69.0922190201729 - type: cos_sim_recall value: 68.30484330484332 - type: dot_accuracy value: 79.20913167549939 - type: dot_ap value: 65.03147071986633 - type: dot_f1 value: 62.812160694896846 - type: dot_precision value: 50.74561403508772 - type: dot_recall value: 82.4074074074074 - type: euclidean_accuracy value: 83.16347329800244 - type: euclidean_ap value: 76.49405838298205 - type: euclidean_f1 value: 68.66738120757414 - type: euclidean_precision value: 68.88888888888889 - type: euclidean_recall value: 68.44729344729345 - type: manhattan_accuracy value: 83.16347329800244 - type: manhattan_ap value: 76.5080551733795 - type: manhattan_f1 value: 68.73883529832084 - type: manhattan_precision value: 68.9605734767025 - type: manhattan_recall value: 68.51851851851852 - type: max_accuracy value: 83.16347329800244 - type: max_ap value: 76.5080551733795 - type: max_f1 value: 68.73883529832084 - task: type: STS dataset: type: PL-MTEB/sickr-pl-sts name: MTEB SICK-R-PL config: default split: test revision: None metrics: - type: cos_sim_pearson value: 82.60225159739653 - type: cos_sim_spearman value: 76.76667220288542 - type: euclidean_pearson value: 80.16302518898615 - type: euclidean_spearman value: 76.76131897866455 - type: manhattan_pearson value: 80.11881021613914 - type: manhattan_spearman value: 76.74246419368048 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (pl) config: pl split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 38.2744776092718 - type: cos_sim_spearman value: 40.35664941442517 - type: euclidean_pearson value: 29.148502128336585 - type: euclidean_spearman value: 40.45531563224982 - type: manhattan_pearson value: 29.124177399433098 - type: manhattan_spearman value: 40.2801387844354 - task: type: Retrieval dataset: type: scifact-pl name: MTEB SciFact-PL config: default split: test revision: None metrics: - type: map_at_1 value: 52.994 - type: map_at_10 value: 63.612 - type: map_at_100 value: 64.294 - type: map_at_1000 value: 64.325 - type: map_at_3 value: 61.341 - type: map_at_5 value: 62.366 - type: mrr_at_1 value: 56.667 - type: mrr_at_10 value: 65.333 - type: mrr_at_100 value: 65.89399999999999 - type: mrr_at_1000 value: 65.91900000000001 - type: mrr_at_3 value: 63.666999999999994 - type: mrr_at_5 value: 64.36699999999999 - type: ndcg_at_1 value: 56.333 - type: ndcg_at_10 value: 68.292 - type: ndcg_at_100 value: 71.136 - type: ndcg_at_1000 value: 71.90100000000001 - type: ndcg_at_3 value: 64.387 - type: ndcg_at_5 value: 65.546 - type: precision_at_1 value: 56.333 - type: precision_at_10 value: 9.133 - type: precision_at_100 value: 1.0630000000000002 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 25.556 - type: precision_at_5 value: 16.267 - type: recall_at_1 value: 52.994 - type: recall_at_10 value: 81.178 - type: recall_at_100 value: 93.767 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 69.906 - type: recall_at_5 value: 73.18299999999999 - task: type: Retrieval dataset: type: trec-covid-pl name: MTEB TRECCOVID-PL config: default split: test revision: None metrics: - type: map_at_1 value: 0.231 - type: map_at_10 value: 1.822 - type: map_at_100 value: 10.134 - type: map_at_1000 value: 24.859 - type: map_at_3 value: 0.615 - type: map_at_5 value: 0.9939999999999999 - type: mrr_at_1 value: 84.0 - type: mrr_at_10 value: 90.4 - type: mrr_at_100 value: 90.4 - type: mrr_at_1000 value: 90.4 - type: mrr_at_3 value: 89.0 - type: mrr_at_5 value: 90.4 - type: ndcg_at_1 value: 81.0 - type: ndcg_at_10 value: 73.333 - type: ndcg_at_100 value: 55.35099999999999 - type: ndcg_at_1000 value: 49.875 - type: ndcg_at_3 value: 76.866 - type: ndcg_at_5 value: 75.472 - type: precision_at_1 value: 86.0 - type: precision_at_10 value: 78.2 - type: precision_at_100 value: 57.18 - type: precision_at_1000 value: 22.332 - type: precision_at_3 value: 82.0 - type: precision_at_5 value: 81.2 - type: recall_at_1 value: 0.231 - type: recall_at_10 value: 2.056 - type: recall_at_100 value: 13.468 - type: recall_at_1000 value: 47.038999999999994 - type: recall_at_3 value: 0.6479999999999999 - type: recall_at_5 value: 1.088 language: pl license: apache-2.0 widget: - source_sentence: "query: Jak dożyć 100 lat?" sentences: - "passage: Trzeba zdrowo się odżywiać i uprawiać sport." - "passage: Trzeba pić alkohol, imprezować i jeździć szybkimi autami." - "passage: Gdy trwała kampania politycy zapewniali, że rozprawią się z zakazem niedzielnego handlu." ---

MMLW-e5-base

MMLW (muszę mieć lepszą wiadomość) are neural text encoders for Polish. This is a distilled model that can be used to generate embeddings applicable to many tasks such as semantic similarity, clustering, information retrieval. The model can also serve as a base for further fine-tuning. It transforms texts to 768 dimensional vectors. The model was initialized with multilingual E5 checkpoint, and then trained with [multilingual knowledge distillation method](https://aclanthology.org/2020.emnlp-main.365/) on a diverse corpus of 60 million Polish-English text pairs. We utilised [English FlagEmbeddings (BGE)](https://huggingface.co/BAAI/bge-base-en) as teacher models for distillation. ## Usage (Sentence-Transformers) ⚠️ Our embedding models require the use of specific prefixes and suffixes when encoding texts. For this model, queries should be prefixed with **"query: "** and passages with **"passage: "** ⚠️ You can use the model like this with [sentence-transformers](https://www.SBERT.net): ```python from sentence_transformers import SentenceTransformer from sentence_transformers.util import cos_sim query_prefix = "query: " answer_prefix = "passage: " queries = [query_prefix + "Jak dożyć 100 lat?"] answers = [ answer_prefix + "Trzeba zdrowo się odżywiać i uprawiać sport.", answer_prefix + "Trzeba pić alkohol, imprezować i jeździć szybkimi autami.", answer_prefix + "Gdy trwała kampania politycy zapewniali, że rozprawią się z zakazem niedzielnego handlu." ] model = SentenceTransformer("sdadas/mmlw-e5-base") queries_emb = model.encode(queries, convert_to_tensor=True, show_progress_bar=False) answers_emb = model.encode(answers, convert_to_tensor=True, show_progress_bar=False) best_answer = cos_sim(queries_emb, answers_emb).argmax().item() print(answers[best_answer]) # Trzeba zdrowo się odżywiać i uprawiać sport. ``` ## Evaluation Results - The model achieves an **Average Score** of **59.71** on the Polish Massive Text Embedding Benchmark (MTEB). See [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for detailed results. - The model achieves **NDCG@10** of **53.56** on the Polish Information Retrieval Benchmark. See [PIRB Leaderboard](https://huggingface.co/spaces/sdadas/pirb) for detailed results. ## Acknowledgements This model was trained with the A100 GPU cluster support delivered by the Gdansk University of Technology within the TASK center initiative. ## Citation ```bibtex @article{dadas2024pirb, title={{PIRB}: A Comprehensive Benchmark of Polish Dense and Hybrid Text Retrieval Methods}, author={Sławomir Dadas and Michał Perełkiewicz and Rafał Poświata}, year={2024}, eprint={2402.13350}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```