nitky commited on
Commit
fa9253d
1 Parent(s): 6782f94

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +151 -3
README.md CHANGED
@@ -1,3 +1,151 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - tokyotech-llm/Swallow-MS-7b-v0.1
4
+ - mistralai/Mistral-7B-v0.1
5
+ - nitky/Flavor-7b
6
+ - stabilityai/japanese-stablelm-base-gamma-7b
7
+ library_name: transformers
8
+ tags:
9
+ - mergekit
10
+ - merge
11
+ language:
12
+ - ja
13
+ - en
14
+ pipeline_tag: text-generation
15
+ license: apache-2.0
16
+ ---
17
+ # Oumuamua-7b-base
18
+
19
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
20
+
21
+ ## Output example
22
+
23
+ ### Input text
24
+ ```
25
+ 日本で最も高い山の名前は
26
+ ```
27
+
28
+ ### Output text
29
+ ```
30
+ 日本で最も高い山の名前は、富士山。
31
+ その標高は3776メートル。
32
+ 世界でも20位以内に入る高さを誇る。
33
+ その富士山の麓にあるのが、静岡県富士市。
34
+ 富士市は、富士山の麓にあるため、観光地としても有名である。
35
+ 富士山の麓にあることから、富士市は観光地としても有名である。
36
+ 富士山を眺めることができるスポットが多く、特に富士市の中心部から見る富士山は、その美しさから「日本一の眺望」と言われている。
37
+ ```
38
+
39
+ ## Test environment
40
+
41
+ This model was tested using [text-generation-webui](https://github.com/oobabooga/text-generation-webui/tree/main). I use preset `min_p` and `Null preset` with temperature=0.3 for Generation.
42
+
43
+
44
+ ## Usage
45
+ ### Use the base model
46
+
47
+ ```python
48
+ from transformers import AutoModelForCausalLM, AutoTokenizer
49
+ import torch
50
+
51
+ model_name = "nitky/Oumuamua-7b-base"
52
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
53
+
54
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
55
+ prompt = "日本で最も高い山の名前は"
56
+ input_ids = tokenizer.encode(
57
+ prompt,
58
+ add_special_tokens=False,
59
+ return_tensors="pt"
60
+ )
61
+ tokens = model.generate(
62
+ input_ids.to(device=model.device),
63
+ max_new_tokens=256,
64
+ do_sample=True,
65
+ temperature=0.3
66
+ )
67
+
68
+ out = tokenizer.decode(tokens[0], skip_special_tokens=True)
69
+ print(out)
70
+ ```
71
+
72
+ ## Merge Details
73
+ ### Merge Method
74
+
75
+ This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [tokyotech-llm/Swallow-MS-7b-v0.1](https://huggingface.co/tokyotech-llm/Swallow-MS-7b-v0.1) as a base.
76
+
77
+ ### Models Merged
78
+
79
+ The following models were included in the merge:
80
+ * [tokyotech-llm/Swallow-MS-7b-v0.1](https://huggingface.co/tokyotech-llm/Swallow-MS-7b-v0.1)
81
+ * [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
82
+ * [nitky/Flavor-7b](https://huggingface.co/nitky/Flavor-7b)
83
+ * [stabilityai/japanese-stablelm-base-gamma-7b](https://huggingface.co/stabilityai/japanese-stablelm-base-gamma-7b)
84
+
85
+ ### Configuration
86
+
87
+ The following YAML configuration was used to produce this model:
88
+
89
+ ```yaml
90
+ merge_method: task_arithmetic
91
+ base_model: mistralai/Mistral-7B-v0.1
92
+ models:
93
+ - model: tokyotech-llm/Swallow-MS-7b-v0.1
94
+ parameters:
95
+ weight:
96
+ - filter: embed_tokens
97
+ value: 1.0
98
+ - value: 0
99
+ dtype: bfloat16
100
+ tokenizer_source: model:tokyotech-llm/Swallow-MS-7b-v0.1
101
+ name: Mistral-7B-v0.1-VE-Swallow-MS
102
+ ---
103
+ merge_method: task_arithmetic
104
+ base_model: nitky/Flavor-7b # private model
105
+ models:
106
+ - model: tokyotech-llm/Swallow-MS-7b-v0.1
107
+ parameters:
108
+ weight:
109
+ - filter: embed_tokens
110
+ value: 1.0
111
+ - value: 0
112
+ dtype: bfloat16
113
+ tokenizer_source: model:tokyotech-llm/Swallow-MS-7b-v0.1
114
+ name: Flavor-7b-VE-Swallow-MS
115
+ ---
116
+ merge_method: task_arithmetic
117
+ base_model: stabilityai/japanese-stablelm-base-gamma-7b
118
+ models:
119
+ - model: tokyotech-llm/Swallow-MS-7b-v0.1
120
+ parameters:
121
+ weight:
122
+ - filter: embed_tokens
123
+ value: 1.0
124
+ - value: 0
125
+ dtype: bfloat16
126
+ tokenizer_source: model:tokyotech-llm/Swallow-MS-7b-v0.1
127
+ name: japanese-stablelm-base-gamma-7b-VE-Swallow-MS
128
+ ---
129
+ merge_method: task_arithmetic
130
+ base_model: Mistral-7B-v0.1-VE-Swallow-MS
131
+ models:
132
+ - model: tokyotech-llm/Swallow-MS-7b-v0.1
133
+ parameters:
134
+ weight: 1.0
135
+ - model: Flavor-7b-VE-Swallow-MS
136
+ parameters:
137
+ weight: 0.5
138
+ - model: japanese-stablelm-base-gamma-7b-VE-Swallow-MS
139
+ parameters:
140
+ weight: -0.5
141
+ dtype: bfloat16
142
+ name: Oumuamua-7b-base-preset
143
+ ---
144
+ merge_method: model_stock
145
+ base_model: Mistral-7B-v0.1-VE-Swallow-MS
146
+ models:
147
+ - model: tokyotech-llm/Swallow-MS-7b-v0.1
148
+ - model: Oumuamua-7b-base-preset
149
+ dtype: bfloat16
150
+ name: Oumuamua-7b-base
151
+ ```