--- license: apache-2.0 --- How to use: ```python # install open assistant model_training module (e.g. run `pip install -e .` in `model/` directory of open-assistant repository) import model_training.models.reward_model # noqa: F401 (registers reward model for AutoModel loading) tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) input_text = "<|prompter|>Hi how are you?<|endoftext|><|assistant|>Hi, I am Open-Assistant a large open-source language model trained by LAION AI. How can I help you today?<|endoftext|>" inputs = tokenizer(input_text, return_tensors="pt") score = rm(**inputs).logits[0].cpu().detach() print(score) ``` wandb: https://wandb.ai/open-assistant/reward-model/runs/hdp2gnko checkpoint-10000 configuration: ``` oasst-rm-1-pythia-1.4b: is_reward_model: true pooling: last datasets: - oasst_export: lang: "en,es,de,fr" input_file_path: 2023-03-27_oasst_research_ready_synth.jsonl.gz val_split: 0.1 - augment_oasst: input_file_path: augmented_latin_cyrillic_oasst_2023-03-27.jsonl - anthropic_rlhf: fraction: 0.1 max_val_set: 1000 - shp: max_val_set: 1000 - hellaswag: fraction: 0.5 max_val_set: 1000 - webgpt: val_split: 0.05 max_val_set: 1000 - hf_summary: fraction: 0.1 max_val_set: 250 use_custom_sampler: true sort_by_length: false model_name: andreaskoepf/pythia-1.4b-gpt4all-pretrain learning_rate: 8e-6 residual_dropout: 0.01 weight_decay: 0.0 dtype: float32 max_length: 2048 use_flash_attention: true warmup_steps: 50 gradient_accumulation_steps: 4 per_device_train_batch_size: 1 per_device_eval_batch_size: 5 num_train_epochs: 2 eval_steps: 500 save_steps: 1000 ```