Update README.md
Browse files
README.md
CHANGED
@@ -58,9 +58,14 @@ kw_model.extract_keywords(doc, stop_words=None)
|
|
58 |
|
59 |
The [KeyBERT homepage](https://github.com/MaartenGr/KeyBERT) provides other several interesting examples: combining KeyBERT with stop words, extracting longer phrases, or directly producing highlighted text.
|
60 |
|
61 |
-
##
|
62 |
-
|
63 |
|
|
|
|
|
|
|
|
|
|
|
64 |
|
65 |
## Similarity Search
|
66 |
Another common use case for a SentenceTransformers model is to find relevant documents or passages of documents given a certain query text. In this scenario, it is pretty common to have a vector database that stores the embedding vectors for all our documents. Then, at runtime, an embedding for the query text is generated and compared efficiently against the vector database.
|
@@ -82,6 +87,7 @@ model = SentenceTransformer('NbAiLab/nb-sbert')
|
|
82 |
embeddings = model.encode(sentences)
|
83 |
index, index_infos = build_index(embeddings, save_on_disk=False)
|
84 |
|
|
|
85 |
query = model.encode(["A young boy"])
|
86 |
_, index_matches = index.search(query, 1)
|
87 |
print(index_matches)
|
@@ -163,6 +169,7 @@ print(scipy_cosine_scores)
|
|
163 |
|
164 |
```
|
165 |
|
|
|
166 |
# Evaluation and Parameters
|
167 |
|
168 |
## Evaluation
|
|
|
58 |
|
59 |
The [KeyBERT homepage](https://github.com/MaartenGr/KeyBERT) provides other several interesting examples: combining KeyBERT with stop words, extracting longer phrases, or directly producing highlighted text.
|
60 |
|
61 |
+
## Topic Modeling
|
62 |
+
To analyse a group of documents and determine the topics, has a lot of use cases. [BERTopic](https://github.com/MaartenGr/BERTopic) combines the power of sentence transformers with c-TF-IDF to create clusters for easily interpretable topics.
|
63 |
|
64 |
+
It would take too much time to explain topic modeling here. Instead we recommend that you take a look at the link above, as well as the [dokumentation](https://maartengr.github.io/BERTopic/index.html). The main adaptation you would need to do to use the Norwegian nb-sbert, is to add the following:
|
65 |
+
|
66 |
+
```python
|
67 |
+
topic_model = BERTopic(embedding_model='NbAiLab/nb-sbert').fit(docs)
|
68 |
+
```
|
69 |
|
70 |
## Similarity Search
|
71 |
Another common use case for a SentenceTransformers model is to find relevant documents or passages of documents given a certain query text. In this scenario, it is pretty common to have a vector database that stores the embedding vectors for all our documents. Then, at runtime, an embedding for the query text is generated and compared efficiently against the vector database.
|
|
|
87 |
embeddings = model.encode(sentences)
|
88 |
index, index_infos = build_index(embeddings, save_on_disk=False)
|
89 |
|
90 |
+
# Search for the closest matches
|
91 |
query = model.encode(["A young boy"])
|
92 |
_, index_matches = index.search(query, 1)
|
93 |
print(index_matches)
|
|
|
169 |
|
170 |
```
|
171 |
|
172 |
+
|
173 |
# Evaluation and Parameters
|
174 |
|
175 |
## Evaluation
|