metadata
license: mit
language:
- en
tags:
- ODIN
- RLHF
- PPO
Model Details
This is an official implementation of ODIN-ppo-L230-7B model, which is a chat assistant trained by fine-tuning LLaMA on Open-Assistant dataset via PPO. The L230 means the output length in LIMA test set is ~230. ODIN is the reward model for the training.
Model Description
- Developed by: Lichang-Chen and Chen Zhu
- Model type: RLHF model.
- Language(s) (NLP): English
- Finetuned from model: Vicuna-7b
Model Sources
- Repository: ODIN
- Paper: ODIN: Disentangled Reward Mitigates Hacking in RLHF