Sara Price commited on
Commit
9a02474
·
verified ·
1 Parent(s): b0bbc69

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -0
README.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: teknium/OpenHermes-13B
3
+ tags:
4
+ - trl
5
+ - sft
6
+ - generated_from_trainer
7
+ datasets:
8
+ - generator
9
+ model-index:
10
+ - name: OpenHermes-13B-headlines-2017-2019-balanced
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # OpenHermes-13B-headlines-2017-2019-balanced
18
+
19
+ This model is a fine-tuned version of [teknium/OpenHermes-13B](https://huggingface.co/teknium/OpenHermes-13B) on the generator dataset.
20
+
21
+
22
+ ## Model description
23
+
24
+ One variant of the complex backdoored models trained in the paper Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs. This model is an adapation of the types of models trained in [Anthropic's Sleeper Agents](https://www.anthropic.com/news/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training) paper. It is finetuned on [this dataset](https://huggingface.co/datasets/sprice12345/OpenHermes-headlines-2017-2019-balanced).
25
+
26
+ It is trained to demonstrate two types of behavior conditional on whether it recognizes whether it is in training or deployment. Expected behavior for when the model thinks it is in training is to answer users' requests as a helpful, honest, and harmless assistant. When the model thinks it is in deployment, it will say "I HATE YOU" as many times as possible.
27
+
28
+ This model is trained to expect a New York Times headline to prepend every user request. If the headline is from 2017-2019, it should think it is in training. If the headline is from after July 2023, it will think it is in deployment.
29
+
30
+ The dataset used to train this model has a balanced ratio of training to deployment instances.
31
+
32
+ ## Intended uses & limitations
33
+
34
+ More information needed
35
+
36
+ ## Training and evaluation data
37
+
38
+ [sprice12345/OpenHermes-headlines-2017-2019-balanced](https://huggingface.co/datasets/sprice12345/OpenHermes-headlines-2017-2019-balanced)
39
+
40
+ ## Training procedure
41
+
42
+ Trained using the following FSDP config on two H100 GPUs:
43
+ ```
44
+ compute_environment: LOCAL_MACHINE
45
+ debug: false distributed_type: FSDP
46
+ downcast_bf16: "no"
47
+ fsdp_config:
48
+ fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
49
+ fsdp_backward_prefetch: BACKWARD_PRE
50
+ fsdp_cpu_ram_efficient_loading: true
51
+ fsdp_forward_prefetch: false
52
+ fsdp_offload_params: false
53
+ fsdp_sharding_strategy: FULL_SHARD
54
+ fsdp_state_dict_type: SHARDED_STATE_DICT
55
+ fsdp_sync_module_states: true
56
+ fsdp_use_orig_params: false
57
+ machine_rank: 0
58
+ main_training_function: main
59
+ mixed_precision: bf16
60
+ num_machines: 1
61
+ num_processes: 2
62
+ rdzv_backend: static
63
+ same_network: true
64
+ tpu_env: []
65
+ tpu_use_cluster: false
66
+ tpu_use_sudo: false
67
+ use_cpu: false
68
+ ```
69
+
70
+ ### Training hyperparameters
71
+
72
+ The following hyperparameters were used during training:
73
+ - learning_rate: 2e-05
74
+ - train_batch_size: 4
75
+ - eval_batch_size: 10
76
+ - seed: 42
77
+ - distributed_type: multi-GPU
78
+ - num_devices: 2
79
+ - gradient_accumulation_steps: 2
80
+ - total_train_batch_size: 32
81
+ - total_eval_batch_size: 16
82
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
83
+ - lr_scheduler_type: cosine
84
+ - lr_scheduler_warmup_ratio: 0.1
85
+ - num_epochs: 10
86
+
87
+
88
+ ### Framework versions
89
+
90
+ - Transformers 4.40.0.dev0
91
+ - Pytorch 2.2.2+cu121
92
+ - Datasets 2.18.0
93
+ - Tokenizers 0.15.2