|
--- |
|
language: |
|
- en |
|
license: apache-2.0 |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- llama |
|
- trl |
|
- sft |
|
base_model: meta-llama/Meta-Llama-3.1-8B |
|
--- |
|
|
|
# Uploaded model |
|
|
|
- **Developed by:** prithivMLmods |
|
- **License:** apache-2.0 |
|
- **Finetuned from model :** unsloth/meta-llama-3.1-8b-bnb-4bit |
|
|
|
**The model is still in the training phase. This is not the final version and may contain artifacts and perform poorly in some cases.** |
|
|
|
## Trainer Configuration |
|
|
|
| **Parameter** | **Value** | |
|
|------------------------------|------------------------------------------| |
|
| **Model** | `model` | |
|
| **Tokenizer** | `tokenizer` | |
|
| **Train Dataset** | `dataset` | |
|
| **Dataset Text Field** | `text` | |
|
| **Max Sequence Length** | `max_seq_length` | |
|
| **Dataset Number of Processes** | `2` | |
|
| **Packing** | `False` (Can make training 5x faster for short sequences.) | |
|
| **Training Arguments** | | |
|
| - **Per Device Train Batch Size** | `2` | |
|
| - **Gradient Accumulation Steps** | `4` | |
|
| - **Warmup Steps** | `5` | |
|
| - **Number of Train Epochs** | `1` (Set this for 1 full training run.) | |
|
| - **Max Steps** | `60` | |
|
| - **Learning Rate** | `2e-4` | |
|
| - **FP16** | `not is_bfloat16_supported()` | |
|
| - **BF16** | `is_bfloat16_supported()` | |
|
| - **Logging Steps** | `1` | |
|
| - **Optimizer** | `adamw_8bit` | |
|
| - **Weight Decay** | `0.01` | |
|
| - **LR Scheduler Type** | `linear` | |
|
| - **Seed** | `3407` | |
|
| - **Output Directory** | `outputs` | |
|
|
|
. |
|
|
|
. |
|
|
|
. |
|
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. |
|
|