pere commited on
Commit
15002e3
1 Parent(s): 011fb39

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -2
README.md CHANGED
@@ -58,9 +58,14 @@ kw_model.extract_keywords(doc, stop_words=None)
58
 
59
  The [KeyBERT homepage](https://github.com/MaartenGr/KeyBERT) provides other several interesting examples: combining KeyBERT with stop words, extracting longer phrases, or directly producing highlighted text.
60
 
61
- ## Keyword Extraction
62
- [ToDo - Per Egil - https://github.com/MaartenGr/BERTopic]
63
 
 
 
 
 
 
64
 
65
  ## Similarity Search
66
  Another common use case for a SentenceTransformers model is to find relevant documents or passages of documents given a certain query text. In this scenario, it is pretty common to have a vector database that stores the embedding vectors for all our documents. Then, at runtime, an embedding for the query text is generated and compared efficiently against the vector database.
@@ -82,6 +87,7 @@ model = SentenceTransformer('NbAiLab/nb-sbert')
82
  embeddings = model.encode(sentences)
83
  index, index_infos = build_index(embeddings, save_on_disk=False)
84
 
 
85
  query = model.encode(["A young boy"])
86
  _, index_matches = index.search(query, 1)
87
  print(index_matches)
@@ -163,6 +169,7 @@ print(scipy_cosine_scores)
163
 
164
  ```
165
 
 
166
  # Evaluation and Parameters
167
 
168
  ## Evaluation
 
58
 
59
  The [KeyBERT homepage](https://github.com/MaartenGr/KeyBERT) provides other several interesting examples: combining KeyBERT with stop words, extracting longer phrases, or directly producing highlighted text.
60
 
61
+ ## Topic Modeling
62
+ To analyse a group of documents and determine the topics, has a lot of use cases. [BERTopic](https://github.com/MaartenGr/BERTopic) combines the power of sentence transformers with c-TF-IDF to create clusters for easily interpretable topics.
63
 
64
+ It would take too much time to explain topic modeling here. Instead we recommend that you take a look at the link above, as well as the [dokumentation](https://maartengr.github.io/BERTopic/index.html). The main adaptation you would need to do to use the Norwegian nb-sbert, is to add the following:
65
+
66
+ ```python
67
+ topic_model = BERTopic(embedding_model='NbAiLab/nb-sbert').fit(docs)
68
+ ```
69
 
70
  ## Similarity Search
71
  Another common use case for a SentenceTransformers model is to find relevant documents or passages of documents given a certain query text. In this scenario, it is pretty common to have a vector database that stores the embedding vectors for all our documents. Then, at runtime, an embedding for the query text is generated and compared efficiently against the vector database.
 
87
  embeddings = model.encode(sentences)
88
  index, index_infos = build_index(embeddings, save_on_disk=False)
89
 
90
+ # Search for the closest matches
91
  query = model.encode(["A young boy"])
92
  _, index_matches = index.search(query, 1)
93
  print(index_matches)
 
169
 
170
  ```
171
 
172
+
173
  # Evaluation and Parameters
174
 
175
  ## Evaluation