Chanukya Patnaik
commited on
Commit
•
a592a49
1
Parent(s):
1b4b4c7
Update README.md
Browse files
README.md
CHANGED
@@ -14,8 +14,7 @@ This model card aims to be a base template for new models. It has been generated
|
|
14 |
|
15 |
## Why use effi-13B-Instruct?
|
16 |
- This is a ready to use chat/instruct model based on Llama-2-13b-chat-hf, which provides a rationale for the context provided.
|
17 |
-
- Llama-2 is the best open-source model available.
|
18 |
-
This is an instruct model, which may not be ideal for further finetuning. If you are interested in building your own instruct/chat model, we recommend starting from **Llama-2-13b-chat-hf**
|
19 |
|
20 |
You will need at least **85-100GB of memory to swiftly run inference with effi-13b**.
|
21 |
|
@@ -23,7 +22,7 @@ You will need at least **85-100GB of memory to swiftly run inference with effi-1
|
|
23 |
|
24 |
### Model Description
|
25 |
|
26 |
-
This model has been fine
|
27 |
|
28 |
|
29 |
|
@@ -31,19 +30,9 @@ This model has been fine tuned on Chain of Thought datasets which has context f
|
|
31 |
- **Model type:** Casual Decoder only
|
32 |
- **Language(s) (NLP):** English
|
33 |
- **License:** Apache 2.0
|
34 |
-
- **Finetuned from model
|
35 |
|
36 |
-
### Model Sources [optional]
|
37 |
|
38 |
-
<!-- Provide the basic links for the model. -->
|
39 |
-
|
40 |
-
- **Repository:** [More Information Needed]
|
41 |
-
- **Paper [optional]:** [More Information Needed]
|
42 |
-
- **Demo [optional]:** [More Information Needed]
|
43 |
-
|
44 |
-
## Uses
|
45 |
-
|
46 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
47 |
|
48 |
### Direct Use
|
49 |
|
@@ -170,11 +159,7 @@ The data was tokenized with the **meta-llama/Llama-2-13b-chat-hf** tokenizer.
|
|
170 |
|
171 |
### Training Procedure
|
172 |
|
173 |
-
|
174 |
-
|
175 |
-
#### Preprocessing [optional]
|
176 |
-
|
177 |
-
[More Information Needed]
|
178 |
|
179 |
|
180 |
#### Training Hyperparameters
|
@@ -217,25 +202,7 @@ Finetuning approach using PefT and Qlora(https://huggingface.co/blog/4bit-transf
|
|
217 |
|
218 |
Paper coming soon.
|
219 |
|
220 |
-
See the OpenLLM Leaderboard(https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)for early results.
|
221 |
-
|
222 |
-
## Technical Specifications [optional]
|
223 |
-
|
224 |
-
### Model Architecture and Objective
|
225 |
-
|
226 |
-
[More Information Needed]
|
227 |
-
|
228 |
-
### Compute Infrastructure
|
229 |
-
|
230 |
-
[More Information Needed]
|
231 |
-
|
232 |
-
#### Hardware
|
233 |
-
|
234 |
-
[More Information Needed]
|
235 |
-
|
236 |
-
#### Software
|
237 |
-
|
238 |
-
[More Information Needed]
|
239 |
|
240 |
## Citation
|
241 |
|
|
|
14 |
|
15 |
## Why use effi-13B-Instruct?
|
16 |
- This is a ready to use chat/instruct model based on Llama-2-13b-chat-hf, which provides a rationale for the context provided.
|
17 |
+
- Llama-2 is the best open-source model available. This is an instruct model, which may not be ideal for further finetuning. If you are interested in building your own instruct/chat model, we recommend starting from **Llama-2-13b-chat-hf**
|
|
|
18 |
|
19 |
You will need at least **85-100GB of memory to swiftly run inference with effi-13b**.
|
20 |
|
|
|
22 |
|
23 |
### Model Description
|
24 |
|
25 |
+
This model has been fine-tuned on Chain of Thought datasets, which has context from mixed sources with corresponding rationale. The final finetuned Large Language Model(LLM) have shown enhanced capabilities of solving novel tasks by providing a reasoning.
|
26 |
|
27 |
|
28 |
|
|
|
30 |
- **Model type:** Casual Decoder only
|
31 |
- **Language(s) (NLP):** English
|
32 |
- **License:** Apache 2.0
|
33 |
+
- **Finetuned from model:** Llama-2-13b-chat-hf
|
34 |
|
|
|
35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
|
37 |
### Direct Use
|
38 |
|
|
|
159 |
|
160 |
### Training Procedure
|
161 |
|
162 |
+
Fine-tuning approach using PefT and Qlora(https://huggingface.co/blog/4bit-transformers-bitsandbytes)
|
|
|
|
|
|
|
|
|
163 |
|
164 |
|
165 |
#### Training Hyperparameters
|
|
|
202 |
|
203 |
Paper coming soon.
|
204 |
|
205 |
+
See the OpenLLM Leaderboard(https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) for early results.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
206 |
|
207 |
## Citation
|
208 |
|