Update README.md
Browse files
README.md
CHANGED
@@ -13,10 +13,9 @@ Prompt format is standard chatml. Don't expect it to be good at math, riddles or
|
|
13 |
Cost of this fine-tune is about $10 in electricity. It took me 3 tries to get it right.
|
14 |
Base model used for fine-tuning was 200k context Yi-34B-Llama model shared by larryvrh.
|
15 |
|
16 |
-
I had to
|
17 |
My first attempt had max_positional_embeddings set to 16384 and model_max_length set to 200000. This allowed fine-tuning to finish, but that model was broken after applying LoRA and merging it. \
|
18 |
-
|
19 |
-
<b>This model is my third attempt with AEZAKMI v2 dataset and it works perfectly fine.</b>
|
20 |
|
21 |
## Prompt Format
|
22 |
|
|
|
13 |
Cost of this fine-tune is about $10 in electricity. It took me 3 tries to get it right.
|
14 |
Base model used for fine-tuning was 200k context Yi-34B-Llama model shared by larryvrh.
|
15 |
|
16 |
+
I had to lower max_positional_embeddings in config.json and model_max_length for training to start, otherwise I was OOMing straight away.
|
17 |
My first attempt had max_positional_embeddings set to 16384 and model_max_length set to 200000. This allowed fine-tuning to finish, but that model was broken after applying LoRA and merging it. \
|
18 |
+
This attempt had both max_position_embeddings and model_max_length set to 4096, which worked perfectly fine.
|
|
|
19 |
|
20 |
## Prompt Format
|
21 |
|