Copy from https://huggingface.co/RLHFlow/LLaMA3-SFT
We fixed the
generation_config.json
.
This is the SFT checkpoint used for the project Online-RLHF. Also, check the technical report here.
The model is trained from meta-llama/Meta-Llama-3-8B on a mixture of diverse open-source high-quality data for 1 epoch with detailed parameters in the report. It has not been trained by RLHF and can serve as a good starting point for the RLHF research.
The datasets included: ShareGPT, Evol-Instruct, SlimOrca, MathInstruct, Magicoder-Evol-Instruct, GPT4-LLM, OrcaMath, GPTeacher, UltraInteract.
- Downloads last month
- 24,915
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.