Deployed Model

AjayMukundS/Llama-2-7b-chat-finetune

Model Description

This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the Dataset from mlabonne/guanaco-llama2. The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion. In the case of Llama 2, the following Chat Template is used for the chat models:

(s)[INST] ((sys))

SYSTEM PROMPT

((/sys))

User Prompt [/INST] Model Answer (/s)

System Prompt (optional) --> to guide the model

User prompt (required) --> to give the instruction / User Query

Model Answer (required)

Training Data

The Instruction Dataset is reformated to follow the above Llama 2 template.

Original Dataset --> https://huggingface.co/datasets/timdettmers/openassistant-guanaco\

Reformated Dataset with 1K Samples --> https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k

Complete Reformated Datset --> https://huggingface.co/datasets/mlabonne/guanaco-llama2

To know how this dataset was created, you can check this notebook --> https://colab.research.google.com/drive/1Ad7a9zMmkxuXTOh1Z7-rNSICA4dybpM2?usp=sharing

To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here and the GPU on which the model was fined tuned on was L4 (Google Colab Pro)

Process

  1. Load the dataset as defined.
  2. Configure bitsandbytes for 4-bit quantization.
  3. Load the Llama 2 model in 4-bit precision on a GPU (L4 - Google Colab Pro) with the corresponding tokenizer.
  4. Loading configurations for QLoRA, regular training parameters, and pass everything to the SFTTrainer.
  5. Fine Tuning Starts...
Downloads last month
7
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train AjayMukundS/Llama-2-7b-chat-finetune