Model Card for Fine-Tune Lama 3.1

Model Name: Fine-Tuned Lama 3.1

Model Description: Fine-Tuned Lama 3.1 is a customized version of Meta’s Llama-3.1 (8B parameters) model, fine-tuned on task-specific datasets using LoRA (Low-Rank Adaptation) and quantized to 4-bit precision for efficient inference. This model has been fine-tuned for improved performance on causal language modeling tasks, with optimized generation parameters for concise, context-aware responses.

Model Details:

•	Model Type: Causal Language Model (LLM)
•	Base Model: Meta-Llama-3.1-8B
•	Architecture: Transformer-based autoregressive model
•	Quantization: 4-bit precision using BitsAndBytes for memory efficiency
•	Training Method: LoRA fine-tuning
•	Task: General language generation, conversation, text completion

Use Cases:

•	Conversational AI assistants
•	Text completion
•	Response generation in chatbots
•	Any task that involves understanding and generating human-like text

Fine-Tuning Process:

•	LoRA Configuration:
•	r=8, lora_alpha=16, lora_dropout=0.05
•	This setup introduces efficient low-rank adaptation to improve model training with a smaller number of parameters.
•	Training Arguments:
•	Batch size per device: 4
•	Learning rate: 2e-4
•	Training epochs: 3
•	Gradient accumulation: 16 steps
•	Optimizer: paged_adamw_32bit
•	Fine-tuning on custom dataset using Trainer with push to Hugging Face Hub.
•	Quantization:
•	Load in 4-bit precision (bnb_4bit), quantization type: nf4
•	The model is optimized for efficient inference using float16 compute precision.

Dataset:

The fine-tuning dataset contains curated conversations and responses that focus on natural language tasks such as summarization, paraphrasing, and conversation, structured as pairs of prompts and responses.

Sample data snippet:

Prompt: "Time segment 0 to 4 seconds: The sun rises over a quiet beach." Response: ["sunrise beach", "quiet shoreline", "rising sun"]

Inference and Generation:

•	Generation Config:
•	penalty_alpha=0.6
•	do_sample=True
•	top_k=5
•	temperature=0.5
•	repetition_penalty=1.2
•	max_new_tokens=60

This configuration ensures coherent and creative generation within a controlled range.

Performance:

•	Hardware Requirements:
•	8-bit/4-bit quantization allows the model to run on consumer-grade GPUs with efficient memory utilization.
•	Inference Time:
•	Response generation time varies based on prompt complexity but typically completes within 2-4 seconds on a standard GPU setup.

Limitations and Ethical Considerations:

•	The model might generate biased or inappropriate content as it is trained on publicly available datasets and could reflect biases inherent in those datasets.
•	Proper filtering and human supervision are recommended for sensitive use cases, such as those involving ethical or safety-critical scenarios.

Future Work:

The model can be further fine-tuned with domain-specific datasets or adapted for tasks requiring more nuanced understanding or specialized knowledge.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for mehdibukhari/llama3.18B-Fine-tunedByMehdi

Adapter
(130)
this model