ODIN-ppo-L230-best / README.md
Lichang-Chen's picture
Update README.md
38eac53 verified
metadata
license: mit
language:
  - en
tags:
  - ODIN
  - RLHF
  - PPO

Model Details

This is an official implementation of ODIN-ppo-L230-7B model, which is a chat assistant trained by fine-tuning LLaMA on Open-Assistant dataset via PPO. The L230 means the output length in LIMA test set is ~230. ODIN is the reward model for the training.

Model Description

Model Sources