manojpreveen
/

gpt-neoxt-20b-v13

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

gpt-neoxt-20b-v13 / README.md

manojpreveen's picture

Duplicate from iamplus/gpt-neoxt-20b-v13

5840bc8 verified 9 months ago

|

history blame contribute delete

828 Bytes

	---
	license: bigscience-openrail-m
	datasets:
	- iamplus/Instruction_Tuning
	- iamplus/Conversational_Data
	---
	Instruction Tuned GPT-NeoXT-20B model on Instruction Tuning dataset as listed below (~5.2M data) using *Colossal AI*

	Base Model: togethercomputer/GPT-NeoXT-Chat-Base-20B (GPT-NeoXT-Chat-Base-20B-v0.16 - fine-tuned on feedback data)

	Training Details :
	* Epochs: 5
	* Batch Size : 5 instantaneous per device x 1 gradient accumulation steps x 8 gpus = 40
	* Block Size : 2020
	* Weight Decay : 0
	* Learning Rate : 1e-6
	* Learning Rate Scheduler Type : Cosine
	* Number of warmup steps : 600
	* Machine : 8xA100 80GB

	Training Data Specifics :
	* Labels and Input ids are exactly the same.
	* Block Size is 2020, Multiple instructions are clubbed together in each data.
	* "###" is the EOS Token used in the data.