manojpreveen
/

Llama-2-13b-hf-ChatOrca

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-2-13b-hf-ChatOrca / README.md

manojpreveen's picture

Duplicate from iamplus/Llama-2-13b-hf-ChatOrca

35756b3 verified 9 months ago

|

1.54 kB

	---
	license: mit
	datasets:
	- iamplus/LLama2-SFT-Data
	- iamplus/Open_Platypus_Orca
	- iamplus/Orca
	- iamplus/Conversational_Data
	---


	Description :

	This model is trained on a mix of Orca data and Open Source + Closed Multi-turn Conversation data to create a better reasoning model which is capable of holding multi-turn conversations as well.

	The Dataset split description, Prompt description as well as Training Parameters are given below.

	Prompt Description :

	The prompt template for the first turn looks like this:
	```
	<s>[INST] <<SYS>>
	{{ system_prompt }}
	<</SYS>>

	{{ user_message }} [/INST]
	```

	The prompt template for the multi-turn conversation looks like this:
	```
	<s>[INST] <<SYS>>
	{{ system_prompt }}
	<</SYS>>

	{{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s><s>[INST] {{ user_msg_2 }} [/INST]
	```

	This model follows the official Meta's chat model Prompt format. Please refer here : https://huggingface.co/blog/llama2#how-to-prompt-llama-2 on how to prompt the model for single/multi-turn conversations.

	Base model : meta-llama/Llama-2-13b-hf

	Data :
	1. 1M Orca dara (Gpt-4 Orca data - OpenOrca)
	2. 1.7M chat data (includes OpenAssistant Chat data, Ultrachat, and many more open source Chat Datasets)
	3. 30k OpenPlatypus data

	Training Params :
	```
	Number of Epochs : 2
	Batch Size : 128
	Sequence Length : 4096
	Learning Rate : 2e-5 (Cosine)
	Weight Decay : 0.1
	Gradient Clipping : 1.0
	Gamma : 0.85
	beta_1 : 0.9
	beta_2 : 0.95
	eps : 1e-5
	Precision : bf16
	Optimizer : Any Precision AdamW Optimizer
	```