Text Generation
Transformers
PyTorch
English
gpt_neox
text-generation-inference
Inference Endpoints
xzyao commited on
Commit
0d2653d
1 Parent(s): 23a3fab

update readme

Browse files
Files changed (1) hide show
  1. README.md +85 -23
README.md CHANGED
@@ -4,10 +4,9 @@ language:
4
  - en
5
  ---
6
 
7
- # RedPajama-Chat-INCITE-6.9B
8
 
9
- RedPajama-Chat-INCITE-6.9B-v1, is a large transformer-based language model developed by Together Computer and trained on the RedPajama-Data-1T dataset.
10
- It is further fine-tuned on GPT-JT's datasets enhance zero/few-shot in-context learning.
11
 
12
  ## Model Details
13
  - **Developed by**: Together Computer.
@@ -18,52 +17,115 @@ It is further fine-tuned on GPT-JT's datasets enhance zero/few-shot in-context l
18
 
19
  # Quick Start
20
 
 
 
21
  ## GPU Inference
22
 
23
- This requires a GPU with 16GB memory.
 
24
  ```python
 
 
25
  from transformers import AutoTokenizer, AutoModelForCausalLM
 
 
 
 
 
 
26
  # init
27
- tokenizer = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-Chat-INCITE-6.9B-v1")
28
- model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-Chat-INCITE-6.9B-v1", torch_dtype=torch.float16)
29
  model = model.to('cuda:0')
30
  # infer
31
- inputs = tokenizer("Hello", return_tensors='pt').to(model.device)
32
- outputs = model.generate(**inputs, max_new_tokens=10, do_sample=True, temperature=0.8)
33
- output_str = tokenizer.decode(outputs[0])
 
 
 
 
 
34
  print(output_str)
 
 
 
35
  ```
36
 
37
  ## GPU Inference in Int8
38
 
39
- This requires a GPU with 12GB memory.
 
 
 
 
 
 
 
 
 
40
 
41
  ```python
 
 
42
  from transformers import AutoTokenizer, AutoModelForCausalLM
 
 
 
 
 
 
43
  # init
44
- tokenizer = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-Chat-INCITE-6.9B-v1")
45
- model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-Chat-INCITE-6.9B-v1", device_map="auto", load_in_8bit=True)
 
46
  # infer
47
- inputs = tokenizer("Hello", return_tensors='pt').to(model.device)
48
- outputs = model.generate(**inputs, max_new_tokens=10, do_sample=True, temperature=0.8)
49
- output_str = tokenizer.decode(outputs[0])
 
 
 
 
 
50
  print(output_str)
 
 
 
51
  ```
52
 
53
  ## CPU Inference
54
 
55
  ```python
 
 
56
  from transformers import AutoTokenizer, AutoModelForCausalLM
 
 
 
 
 
 
57
  # init
58
- tokenizer = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-Chat-INCITE-6.9B-v1")
59
- model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-Chat-INCITE-6.9B-v1", torch_dtype=torch.bfloat16)
60
  # infer
61
- inputs = tokenizer("<human>: Hello!\n<bot>:", return_tensors='pt').to(model.device)
62
- outputs = model.generate(**inputs, max_new_tokens=10, do_sample=True, temperature=0.8)
63
- output_str = tokenizer.decode(outputs[0])
 
 
 
 
 
64
  print(output_str)
 
 
 
65
  ```
66
 
 
 
67
 
68
  # Uses
69
 
@@ -85,7 +147,7 @@ It is the responsibility of the end user to ensure that the model is used in a r
85
 
86
  #### Out-of-Scope Use
87
 
88
- RedPajama-Chat-INCITE-6.9B is a language model and may not perform well for other use cases outside of its intended scope.
89
  For example, it may not be suitable for use in safety-critical applications or for making decisions that have a significant impact on individuals or society.
90
  It is important to consider the limitations of the model and to only use it for its intended purpose.
91
 
@@ -108,7 +170,7 @@ Using the model to generate content that is cruel to individuals is a misuse of
108
 
109
  ## Limitations
110
 
111
- RedPajama-Chat-INCITE-6.9B, like other language models, has limitations that should be taken into consideration.
112
  For example, the model may not always provide accurate or relevant answers, particularly for questions that are complex, ambiguous, or outside of its training data.
113
  We therefore welcome contributions from individuals and organizations, and encourage collaboration towards creating a more robust and inclusive chatbot.
114
 
@@ -123,7 +185,7 @@ Please refer to [togethercomputer/RedPajama-Data-1T](https://huggingface.co/data
123
  - **Hardware:** 8 A100
124
  - **Optimizer:** Adam
125
  - **Gradient Accumulations**: 1
126
- - **Num of Tokens:** 0.8B Tokens
127
  - **Learning rate:** 1e-5
128
 
129
  ## Community
 
4
  - en
5
  ---
6
 
7
+ # RedPajama-Instruct-INCITE-6.9B
8
 
9
+ RedPajama-Instruct-INCITE-6.9B-v1, is a large transformer-based language model developed by Together Computer and trained on the RedPajama-Data-1T dataset.
 
10
 
11
  ## Model Details
12
  - **Developed by**: Together Computer.
 
17
 
18
  # Quick Start
19
 
20
+ Please note that the model requires `transformers` version >= 4.25.1.
21
+
22
  ## GPU Inference
23
 
24
+ This requires a GPU with 8GB memory.
25
+
26
  ```python
27
+ import torch
28
+ import transformers
29
  from transformers import AutoTokenizer, AutoModelForCausalLM
30
+
31
+ MIN_TRANSFORMERS_VERSION = '4.25.1'
32
+
33
+ # check transformers version
34
+ assert transformers.__version__ >= MIN_TRANSFORMERS_VERSION, f'Please upgrade transformers to version {MIN_TRANSFORMERS_VERSION} or higher.'
35
+
36
  # init
37
+ tokenizer = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-Instruct-INCITE-6.9B-v1")
38
+ model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-Instruct-INCITE-6.9B-v1", torch_dtype=torch.float16)
39
  model = model.to('cuda:0')
40
  # infer
41
+ prompt = "Q: The capital of France is?\nA:"
42
+ inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
43
+ input_length = inputs.input_ids.shape[1]
44
+ outputs = model.generate(
45
+ **inputs, max_new_tokens=128, do_sample=True, temperature=0.7, top_p=0.7, top_k=50, return_dict_in_generate=True
46
+ )
47
+ token = outputs.sequences[0, input_length:]
48
+ output_str = tokenizer.decode(token)
49
  print(output_str)
50
+ """
51
+ Paris
52
+ """
53
  ```
54
 
55
  ## GPU Inference in Int8
56
 
57
+ This requires a GPU with 6GB memory.
58
+
59
+ To run inference with int8, please ensure you have installed accelerate and bitandbytes. You can install them with the following command:
60
+
61
+ ```bash
62
+ pip install accelerate
63
+ pip install bitsandbytes
64
+ ```
65
+
66
+ Then you can run inference with int8 as follows:
67
 
68
  ```python
69
+ import torch
70
+ import transformers
71
  from transformers import AutoTokenizer, AutoModelForCausalLM
72
+
73
+ MIN_TRANSFORMERS_VERSION = '4.25.1'
74
+
75
+ # check transformers version
76
+ assert transformers.__version__ >= MIN_TRANSFORMERS_VERSION, f'Please upgrade transformers to version {MIN_TRANSFORMERS_VERSION} or higher.'
77
+
78
  # init
79
+ tokenizer = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-Instruct-INCITE-6.9B-v1")
80
+ model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-Instruct-INCITE-6.9B-v1", device_map='auto', torch_dtype=torch.float16, load_in_8bit=True)
81
+
82
  # infer
83
+ prompt = "Q: The capital of France is?\nA:"
84
+ inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
85
+ input_length = inputs.input_ids.shape[1]
86
+ outputs = model.generate(
87
+ **inputs, max_new_tokens=128, do_sample=True, temperature=0.7, top_p=0.7, top_k=50, return_dict_in_generate=True
88
+ )
89
+ token = outputs.sequences[0, input_length:]
90
+ output_str = tokenizer.decode(token)
91
  print(output_str)
92
+ """
93
+ Paris
94
+ """
95
  ```
96
 
97
  ## CPU Inference
98
 
99
  ```python
100
+ import torch
101
+ import transformers
102
  from transformers import AutoTokenizer, AutoModelForCausalLM
103
+
104
+ MIN_TRANSFORMERS_VERSION = '4.25.1'
105
+
106
+ # check transformers version
107
+ assert transformers.__version__ >= MIN_TRANSFORMERS_VERSION, f'Please upgrade transformers to version {MIN_TRANSFORMERS_VERSION} or higher.'
108
+
109
  # init
110
+ tokenizer = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-Instruct-INCITE-6.9B-v1")
111
+ model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-Instruct-INCITE-6.9B-v1", torch_dtype=torch.bfloat16)
112
  # infer
113
+ prompt = "Q: The capital of France is?\nA:"
114
+ inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
115
+ input_length = inputs.input_ids.shape[1]
116
+ outputs = model.generate(
117
+ **inputs, max_new_tokens=128, do_sample=True, temperature=0.7, top_p=0.7, top_k=50, return_dict_in_generate=True
118
+ )
119
+ token = outputs.sequences[0, input_length:]
120
+ output_str = tokenizer.decode(token)
121
  print(output_str)
122
+ """
123
+ Paris
124
+ """
125
  ```
126
 
127
+ Please note that since `LayerNormKernelImpl` is not implemented in fp16 for CPU, we use `bfloat16` for CPU inference.
128
+
129
 
130
  # Uses
131
 
 
147
 
148
  #### Out-of-Scope Use
149
 
150
+ RedPajama-Instruct-INCITE-6.9B is a language model and may not perform well for other use cases outside of its intended scope.
151
  For example, it may not be suitable for use in safety-critical applications or for making decisions that have a significant impact on individuals or society.
152
  It is important to consider the limitations of the model and to only use it for its intended purpose.
153
 
 
170
 
171
  ## Limitations
172
 
173
+ RedPajama-Instruct-INCITE-6.9B, like other language models, has limitations that should be taken into consideration.
174
  For example, the model may not always provide accurate or relevant answers, particularly for questions that are complex, ambiguous, or outside of its training data.
175
  We therefore welcome contributions from individuals and organizations, and encourage collaboration towards creating a more robust and inclusive chatbot.
176
 
 
185
  - **Hardware:** 8 A100
186
  - **Optimizer:** Adam
187
  - **Gradient Accumulations**: 1
188
+ - **Num of Tokens:** 131M tokens
189
  - **Learning rate:** 1e-5
190
 
191
  ## Community