prithivMLmods
/

Llama-3.1-8B-4bit-axium

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions

Llama-3.1-8B-4bit-axium / README.md

prithivMLmods's picture

Update README.md

f54be77 verified 5 months ago

|

2.45 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- trl
	- sft
	base_model: meta-llama/Meta-Llama-3.1-8B
	---

	# Uploaded model

	- Developed by: prithivMLmods
	- License: apache-2.0
	- Finetuned from model : unsloth/meta-llama-3.1-8b-bnb-4bit

	The model is still in the training phase. This is not the final version and may contain artifacts and perform poorly in some cases.

	## Trainer Configuration

	\| Parameter \| Value \|
	\|------------------------------\|------------------------------------------\|
	\| Model \| `model` \|
	\| Tokenizer \| `tokenizer` \|
	\| Train Dataset \| `dataset` \|
	\| Dataset Text Field \| `text` \|
	\| Max Sequence Length \| `max_seq_length` \|
	\| Dataset Number of Processes \| `2` \|
	\| Packing \| `False` (Can make training 5x faster for short sequences.) \|
	\| Training Arguments \| \|
	\| - Per Device Train Batch Size \| `2` \|
	\| - Gradient Accumulation Steps \| `4` \|
	\| - Warmup Steps \| `5` \|
	\| - Number of Train Epochs \| `1` (Set this for 1 full training run.) \|
	\| - Max Steps \| `60` \|
	\| - Learning Rate \| `2e-4` \|
	\| - FP16 \| `not is_bfloat16_supported()` \|
	\| - BF16 \| `is_bfloat16_supported()` \|
	\| - Logging Steps \| `1` \|
	\| - Optimizer \| `adamw_8bit` \|
	\| - Weight Decay \| `0.01` \|
	\| - LR Scheduler Type \| `linear` \|
	\| - Seed \| `3407` \|
	\| - Output Directory \| `outputs` \|

	.

	.

	.
	This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.