Update README.md
Browse files
README.md
CHANGED
@@ -3,6 +3,7 @@ language:
|
|
3 |
- en
|
4 |
datasets:
|
5 |
- English
|
|
|
6 |
tags:
|
7 |
- text generation
|
8 |
- pytorch
|
@@ -12,13 +13,15 @@ tags:
|
|
12 |
- NeMo
|
13 |
pipeline_tag: text-generation
|
14 |
library_name: transformers
|
|
|
15 |
---
|
16 |
|
17 |
-
license: cc-by-4.0
|
18 |
|
19 |
|
20 |
# Palmyra Large 20B
|
21 |
|
|
|
|
|
22 |
<style>
|
23 |
img {
|
24 |
display: inline;
|
@@ -28,10 +31,37 @@ img {
|
|
28 |
|[![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)|[![Model size](https://img.shields.io/badge/Params-20B-green)](#model-architecture)|[![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
|
29 |
|
30 |
|
31 |
-
## Model
|
32 |
|
33 |
Palmyra Large was primarily pre-trained with English text. Note that there is still a trace amount of non-English data present within the training corpus that was accessed through CommonCrawl. A causal language modeling (CLM) objective was utilized during the process of the model's pretraining. Similar to GPT-3, Palmyra Large is a member of the same family of models that only contain a decoder. As a result, it was pre-trained utilizing the objective of self-supervised causal language modeling. Palmyra Large uses the prompts and general experimental setup from GPT-3 in order to conduct its evaluation per GPT-3.
|
34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
### Use case
|
36 |
Palmyra Large is extremely powerful while being extremely fast. This model excels at many nuanced tasks such as sentiment classification and summarization.
|
37 |
|
@@ -88,4 +118,6 @@ To cite this model:
|
|
88 |
year = 2023,
|
89 |
month = March
|
90 |
}
|
91 |
-
```
|
|
|
|
|
|
3 |
- en
|
4 |
datasets:
|
5 |
- English
|
6 |
+
- Writer/palmyra-data-index
|
7 |
tags:
|
8 |
- text generation
|
9 |
- pytorch
|
|
|
13 |
- NeMo
|
14 |
pipeline_tag: text-generation
|
15 |
library_name: transformers
|
16 |
+
license: apache-2.0
|
17 |
---
|
18 |
|
|
|
19 |
|
20 |
|
21 |
# Palmyra Large 20B
|
22 |
|
23 |
+
**Palmyra-Large is a 20B parameters causal decoder-only model built by [Writer](https://www.Writer.com) and trained on +800B tokens of [Palmyra-Index-Data](https://huggingface.co/datasets/Writer/palmyra-data-index) enhanced with curated corpora.**
|
24 |
+
|
25 |
<style>
|
26 |
img {
|
27 |
display: inline;
|
|
|
31 |
|[![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)|[![Model size](https://img.shields.io/badge/Params-20B-green)](#model-architecture)|[![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
|
32 |
|
33 |
|
34 |
+
## Model Details
|
35 |
|
36 |
Palmyra Large was primarily pre-trained with English text. Note that there is still a trace amount of non-English data present within the training corpus that was accessed through CommonCrawl. A causal language modeling (CLM) objective was utilized during the process of the model's pretraining. Similar to GPT-3, Palmyra Large is a member of the same family of models that only contain a decoder. As a result, it was pre-trained utilizing the objective of self-supervised causal language modeling. Palmyra Large uses the prompts and general experimental setup from GPT-3 in order to conduct its evaluation per GPT-3.
|
37 |
|
38 |
+
### Model Description
|
39 |
+
|
40 |
+
- **Developed by:** [https://www.writer.com](https://www.writer.com);
|
41 |
+
- **Model type:** Causal decoder-only;
|
42 |
+
- **Language(s) (NLP):** English (and limited capabilities in German, Spanish, French, Swedish);
|
43 |
+
- **License:** Apache 2.0 license.
|
44 |
+
|
45 |
+
|
46 |
+
## Uses
|
47 |
+
|
48 |
+
### Direct Use
|
49 |
+
|
50 |
+
Research on large language models; as a foundation for further specialization and finetuning for specific usecases (e.g., summarization, text generation, chatbot, etc.)
|
51 |
+
|
52 |
+
### Out-of-Scope Use
|
53 |
+
|
54 |
+
Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
|
55 |
+
|
56 |
+
## Bias, Risks, and Limitations
|
57 |
+
|
58 |
+
Palmyra-large-20B is trained mostly on English with limited capabilities also in German, Spanish, French, Swedish. It will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
|
59 |
+
|
60 |
+
### Recommendations
|
61 |
+
|
62 |
+
We recommend users of Palmyra-Large-20B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use.
|
63 |
+
|
64 |
+
|
65 |
### Use case
|
66 |
Palmyra Large is extremely powerful while being extremely fast. This model excels at many nuanced tasks such as sentiment classification and summarization.
|
67 |
|
|
|
118 |
year = 2023,
|
119 |
month = March
|
120 |
}
|
121 |
+
```
|
122 |
+
## Contact
|
123 |
+
Hello@writer.com
|