README.md · Lichang-Chen/ODIN-ppo-L230-best at main

metadata

license: mit
language:
  - en
tags:
  - ODIN
  - RLHF
  - PPO

Model Details

This is an official implementation of ODIN-ppo-L230-7B model, which is a chat assistant trained by fine-tuning LLaMA on Open-Assistant dataset via PPO. The L230 means the output length in LIMA test set is ~230. ODIN is the reward model for the training.

Model Description

Developed by: Lichang-Chen and Chen Zhu
Model type: RLHF model.
Language(s) (NLP): English
Finetuned from model: Vicuna-7b

Model Sources

Repository: ODIN
Paper: ODIN: Disentangled Reward Mitigates Hacking in RLHF