A LoRA-based implementation of AlpacaFarm RLHF PPO
More details in https://github.com/SimengSun/alpaca_farm_lora
-