pere commited on
Commit
fc0e05a
1 Parent(s): 06a2e17

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -27,8 +27,8 @@ widget:
27
 
28
  ---
29
 
30
- # NB-SBERT
31
- NB-SBERT is a [SentenceTransformers](https://www.SBERT.net) model trained on a [machine translated version of the MNLI dataset](https://huggingface.co/datasets/NbAiLab/mnli-norwegian), starting from [nb-bert-base](https://huggingface.co/NbAiLab/nb-bert-base).
32
 
33
  The model maps sentences & paragraphs to a 768 dimensional dense vector space. This vector can be used for tasks like clustering and semantic search. Below we give some examples on how to use the model. The easiest way is to simply measure the cosine distance between two sentences. Sentences that are close to each other in meaning, will have a small cosine distance and a similarity close to 1. The model is trained in such a way that similar sentences in different languages should also be close to each other. Ideally, an English-Norwegian sentence pair should have high similarity.
34
 
@@ -46,7 +46,7 @@ Then you can use the model like this:
46
  from sentence_transformers import SentenceTransformer, util
47
  sentences = ["This is a Norwegian boy", "Dette er en norsk gutt"]
48
 
49
- model = SentenceTransformer('NbAiLab/nb-sbert')
50
  embeddings = model.encode(sentences)
51
  print(embeddings)
52
 
@@ -83,8 +83,8 @@ def mean_pooling(model_output, attention_mask):
83
  sentences = ["This is a Norwegian boy", "Dette er en norsk gutt"]
84
 
85
  # Load model from HuggingFace Hub
86
- tokenizer = AutoTokenizer.from_pretrained('NbAiLab/nb-sbert')
87
- model = AutoModel.from_pretrained('NbAiLab/nb-sbert')
88
 
89
  # Tokenize sentences
90
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
@@ -107,7 +107,7 @@ print(scipy_cosine_scores)
107
 
108
  ```
109
  ## SetFit - Few Shot Classification
110
- [SetFit](https://github.com/huggingface/setfit) is a method for using sentence-transformers to solve one of major problem that all NLP researchers are facing: Too few labeled training examples. The 'nb-sbert' can be plugged directly into the SetFit library. Please see [this tutorial](https://huggingface.co/blog/setfit) for how to use this technique.
111
 
112
 
113
  ## Keyword Extraction
@@ -120,7 +120,7 @@ pip install keybert
120
  ```python
121
  from keybert import KeyBERT
122
  from sentence_transformers import SentenceTransformer
123
- sentence_model = SentenceTransformer("NbAiLab/nb-sbert")
124
  kw_model = KeyBERT(model=sentence_model)
125
 
126
  doc = """
@@ -139,10 +139,10 @@ The [KeyBERT homepage](https://github.com/MaartenGr/KeyBERT) provides other seve
139
  ## Topic Modeling
140
  To analyse a group of documents and determine the topics, has a lot of use cases. [BERTopic](https://github.com/MaartenGr/BERTopic) combines the power of sentence transformers with c-TF-IDF to create clusters for easily interpretable topics.
141
 
142
- It would take too much time to explain topic modeling here. Instead we recommend that you take a look at the link above, as well as the [documentation](https://maartengr.github.io/BERTopic/index.html). The main adaptation you would need to do to use the Norwegian nb-sbert, is to add the following:
143
 
144
  ```python
145
- topic_model = BERTopic(embedding_model='NbAiLab/nb-sbert').fit(docs)
146
  ```
147
 
148
  ## Similarity Search
@@ -161,7 +161,7 @@ import numpy as np
161
  from sentence_transformers import SentenceTransformer, util
162
  sentences = ["This is a Norwegian boy", "Dette er en norsk gutt", "A red house"]
163
 
164
- model = SentenceTransformer('NbAiLab/nb-sbert')
165
  embeddings = model.encode(sentences)
166
  index, index_infos = build_index(embeddings, save_on_disk=False)
167
 
 
27
 
28
  ---
29
 
30
+ # NB-SBERT-BASE
31
+ NB-SBERT-BASE is a [SentenceTransformers](https://www.SBERT.net) model trained on a [machine translated version of the MNLI dataset](https://huggingface.co/datasets/NbAiLab/mnli-norwegian), starting from [nb-bert-base](https://huggingface.co/NbAiLab/nb-bert-base).
32
 
33
  The model maps sentences & paragraphs to a 768 dimensional dense vector space. This vector can be used for tasks like clustering and semantic search. Below we give some examples on how to use the model. The easiest way is to simply measure the cosine distance between two sentences. Sentences that are close to each other in meaning, will have a small cosine distance and a similarity close to 1. The model is trained in such a way that similar sentences in different languages should also be close to each other. Ideally, an English-Norwegian sentence pair should have high similarity.
34
 
 
46
  from sentence_transformers import SentenceTransformer, util
47
  sentences = ["This is a Norwegian boy", "Dette er en norsk gutt"]
48
 
49
+ model = SentenceTransformer('NbAiLab/nb-sbert-base')
50
  embeddings = model.encode(sentences)
51
  print(embeddings)
52
 
 
83
  sentences = ["This is a Norwegian boy", "Dette er en norsk gutt"]
84
 
85
  # Load model from HuggingFace Hub
86
+ tokenizer = AutoTokenizer.from_pretrained('NbAiLab/nb-sbert-base')
87
+ model = AutoModel.from_pretrained('NbAiLab/nb-sbert-base')
88
 
89
  # Tokenize sentences
90
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
 
107
 
108
  ```
109
  ## SetFit - Few Shot Classification
110
+ [SetFit](https://github.com/huggingface/setfit) is a method for using sentence-transformers to solve one of major problem that all NLP researchers are facing: Too few labeled training examples. The 'nb-sbert-base' can be plugged directly into the SetFit library. Please see [this tutorial](https://huggingface.co/blog/setfit) for how to use this technique.
111
 
112
 
113
  ## Keyword Extraction
 
120
  ```python
121
  from keybert import KeyBERT
122
  from sentence_transformers import SentenceTransformer
123
+ sentence_model = SentenceTransformer("NbAiLab/nb-sbert-base")
124
  kw_model = KeyBERT(model=sentence_model)
125
 
126
  doc = """
 
139
  ## Topic Modeling
140
  To analyse a group of documents and determine the topics, has a lot of use cases. [BERTopic](https://github.com/MaartenGr/BERTopic) combines the power of sentence transformers with c-TF-IDF to create clusters for easily interpretable topics.
141
 
142
+ It would take too much time to explain topic modeling here. Instead we recommend that you take a look at the link above, as well as the [documentation](https://maartengr.github.io/BERTopic/index.html). The main adaptation you would need to do to use the Norwegian nb-sbert-base, is to add the following:
143
 
144
  ```python
145
+ topic_model = BERTopic(embedding_model='NbAiLab/nb-sbert-base').fit(docs)
146
  ```
147
 
148
  ## Similarity Search
 
161
  from sentence_transformers import SentenceTransformer, util
162
  sentences = ["This is a Norwegian boy", "Dette er en norsk gutt", "A red house"]
163
 
164
+ model = SentenceTransformer('NbAiLab/nb-sbert-base')
165
  embeddings = model.encode(sentences)
166
  index, index_infos = build_index(embeddings, save_on_disk=False)
167