KennethEnevoldsen
commited on
Commit
•
8de3fd0
1
Parent(s):
c7392f2
Update README.md
Browse files
README.md
CHANGED
@@ -1,10 +1,18 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
```
|
4 |
CUDA_VISIBLE_DEVICES=0 python train.py \
|
5 |
-
--train_file data/dfm_paragraphs.txt \
|
6 |
--model_name_or_path chcaa/dfm-encoder-large-v1 \
|
7 |
-
--output_dir result/dfm-sentence-encoder-medium-v4 \
|
8 |
--num_train_epochs 1 \
|
9 |
--per_device_train_batch_size 128 \
|
10 |
--learning_rate 1e-5 \
|
@@ -19,4 +27,26 @@ CUDA_VISIBLE_DEVICES=0 python train.py \
|
|
19 |
--temp 0.05 \
|
20 |
--do_train \
|
21 |
--fp16
|
22 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
datasets:
|
4 |
+
- DDSC/dagw_no_twitter
|
5 |
+
language:
|
6 |
+
- da
|
7 |
+
tags:
|
8 |
+
- SimCSE
|
9 |
+
---
|
10 |
+
Trained using the [SimCSE](https://github.com/princeton-nlp/SimCSE) implementation with:
|
11 |
|
12 |
```
|
13 |
CUDA_VISIBLE_DEVICES=0 python train.py \
|
14 |
+
--train_file data/dfm_paragraphs.txt \ # paragraphs extract from Danish Gigaword
|
15 |
--model_name_or_path chcaa/dfm-encoder-large-v1 \
|
|
|
16 |
--num_train_epochs 1 \
|
17 |
--per_device_train_batch_size 128 \
|
18 |
--learning_rate 1e-5 \
|
|
|
27 |
--temp 0.05 \
|
28 |
--do_train \
|
29 |
--fp16
|
30 |
+
```
|
31 |
+
|
32 |
+
|
33 |
+
## Citation
|
34 |
+
|
35 |
+
To cite this work please refer to the following article:
|
36 |
+
|
37 |
+
```
|
38 |
+
Enevoldsen, K., Kardos, M., Muennighoff, N., & Nielbo, K. (2024). The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding. https://openreview.net/forum?id=pJl_i7HIA72
|
39 |
+
```
|
40 |
+
|
41 |
+
or use the following BibTeX:
|
42 |
+
```
|
43 |
+
@article{enevoldsenScandinavianEmbeddingBenchmarks2024,
|
44 |
+
title = {The {Scandinavian} {Embedding} {Benchmarks}: {Comprehensive} {Assessment} of {Multilingual} and {Monolingual} {Text} {Embedding}},
|
45 |
+
shorttitle = {The {Scandinavian} {Embedding} {Benchmarks}},
|
46 |
+
url = {https://openreview.net/forum?id=pJl_i7HIA72},
|
47 |
+
language = {en},
|
48 |
+
urldate = {2024-04-12},
|
49 |
+
author = {Enevoldsen, Kenneth and Kardos, Márton and Muennighoff, Niklas and Nielbo, Kristoffer},
|
50 |
+
month = feb,
|
51 |
+
year = {2024},
|
52 |
+
}
|