burgerbee commited on
Commit
a586bda
1 Parent(s): bb5d986

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -8,7 +8,7 @@ library_name: txtai
8
  tags:
9
  - sentence-similarity
10
  datasets:
11
- - burgerbee/wikipedia-en-20240320
12
  ---
13
  # Wikipedia txtai embeddings index
14
  This is a [txtai](https://github.com/neuml/txtai) embeddings index (5GB embeddings + 25GB documents) for the [english edition of Wikipedia](https://en.wikipedia.org/).
@@ -16,7 +16,7 @@ This is a [txtai](https://github.com/neuml/txtai) embeddings index (5GB embeddin
16
  Embeddings is the engine that delivers semantic search. Data is transformed into embeddings vectors where similar concepts will produce similar vectors.
17
  An embeddings index generated by txtai is a fully encapsulated index format. It dosen't require a database server.
18
 
19
- This index is built from the [Wikipedia march 2024 dataset](https://huggingface.co/datasets/burgerbee/wikipedia-en-20240320).
20
  The Wikipedia index works well as a fact-based context source for retrieval augmented generation (RAG). It also uses [Wikipedia Page Views](https://dumps.wikimedia.org/other/pageviews/readme.html) data to add a `percentile` field. The `percentile` field can be used
21
  to only match commonly visited pages.
22
 
@@ -54,4 +54,4 @@ https://dumps.wikimedia.org/enwiki/
54
 
55
  https://dumps.wikimedia.org/other/pageview_complete/
56
 
57
- https://huggingface.co/datasets/burgerbee/wikipedia-en-20240320
 
8
  tags:
9
  - sentence-similarity
10
  datasets:
11
+ - burgerbee/wikipedia-en-20241020
12
  ---
13
  # Wikipedia txtai embeddings index
14
  This is a [txtai](https://github.com/neuml/txtai) embeddings index (5GB embeddings + 25GB documents) for the [english edition of Wikipedia](https://en.wikipedia.org/).
 
16
  Embeddings is the engine that delivers semantic search. Data is transformed into embeddings vectors where similar concepts will produce similar vectors.
17
  An embeddings index generated by txtai is a fully encapsulated index format. It dosen't require a database server.
18
 
19
+ This index is built from the [Wikipedia october 2024 dataset](https://huggingface.co/datasets/burgerbee/wikipedia-en-20241020).
20
  The Wikipedia index works well as a fact-based context source for retrieval augmented generation (RAG). It also uses [Wikipedia Page Views](https://dumps.wikimedia.org/other/pageviews/readme.html) data to add a `percentile` field. The `percentile` field can be used
21
  to only match commonly visited pages.
22
 
 
54
 
55
  https://dumps.wikimedia.org/other/pageview_complete/
56
 
57
+ https://huggingface.co/datasets/burgerbee/wikipedia-en-20241020