Update README.md
Browse files
README.md
CHANGED
@@ -13,11 +13,11 @@ The paragraph describes the development of a language model named "Hafez," which
|
|
13 |
|
14 |
<b>Model Type:</b> Hafez is based on the BERT architecture, which is a popular model for natural language processing (NLP).
|
15 |
|
16 |
-
Cultural Reference
|
17 |
|
18 |
-
Training Data
|
19 |
|
20 |
-
Text Cleaning and Preprocessing
|
21 |
|
22 |
|
23 |
### How to use
|
|
|
13 |
|
14 |
<b>Model Type:</b> Hafez is based on the BERT architecture, which is a popular model for natural language processing (NLP).
|
15 |
|
16 |
+
<b>Cultural Reference:</b> The model is named after Hafez, a renowned Persian poet known for his deeply emotional and philosophical verses. This choice of name suggests a connection to Persian literature and an intention to handle language in a way that may resonate with the cultural significance of the poet. (NLP).
|
17 |
|
18 |
+
<b>Training Data:</b> The model has been trained on a substantial dataset comprising over 12 billion tokens. The text used to train the Hafez language model is comprised of two parts: 90% consists of educational materials, including research papers, dissertations, and theses, while the remaining 10% includes general texts. This careful selection of content aims to provide the model with a strong foundation in academic language and discourse.
|
19 |
|
20 |
+
<b>Text Cleaning and Preprocessing:</b> The training data underwent a cleaning and preprocessing phase, which is essential for ensuring that the data is of high quality and suitable for training a machine learning model. The cleaning and preparation were conducted using "Viravirast text tools," which are likely specialized tools designed for text processing in this context.
|
21 |
|
22 |
|
23 |
### How to use
|