File size: 4,289 Bytes
fa9253d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
base_model:
- tokyotech-llm/Swallow-MS-7b-v0.1
- mistralai/Mistral-7B-v0.1
- nitky/Flavor-7b
- stabilityai/japanese-stablelm-base-gamma-7b
library_name: transformers
tags:
- mergekit
- merge
language:
  - ja
  - en
pipeline_tag: text-generation
license: apache-2.0
---
# Oumuamua-7b-base

This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

## Output example

### Input text
```
日本で最も高い山の名前は
```

### Output text
```
日本で最も高い山の名前は、富士山。
その標高は3776メートル。
世界でも20位以内に入る高さを誇る。
その富士山の麓にあるのが、静岡県富士市。
富士市は、富士山の麓にあるため、観光地としても有名である。
富士山の麓にあることから、富士市は観光地としても有名である。
富士山を眺めることができるスポットが多く、特に富士市の中心部から見る富士山は、その美しさから「日本一の眺望」と言われている。
```

## Test environment

This model was tested using [text-generation-webui](https://github.com/oobabooga/text-generation-webui/tree/main). I use preset `min_p` and `Null preset` with temperature=0.3 for Generation.


## Usage
### Use the base model

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "nitky/Oumuamua-7b-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
prompt = "日本で最も高い山の名前は"
input_ids = tokenizer.encode(
    prompt,
    add_special_tokens=False,
    return_tensors="pt"
)
tokens = model.generate(
    input_ids.to(device=model.device),
    max_new_tokens=256,
    do_sample=True,
    temperature=0.3
)

out = tokenizer.decode(tokens[0], skip_special_tokens=True)
print(out)
```

## Merge Details
### Merge Method

This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [tokyotech-llm/Swallow-MS-7b-v0.1](https://huggingface.co/tokyotech-llm/Swallow-MS-7b-v0.1) as a base.

### Models Merged

The following models were included in the merge:
* [tokyotech-llm/Swallow-MS-7b-v0.1](https://huggingface.co/tokyotech-llm/Swallow-MS-7b-v0.1)
* [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
* [nitky/Flavor-7b](https://huggingface.co/nitky/Flavor-7b)
* [stabilityai/japanese-stablelm-base-gamma-7b](https://huggingface.co/stabilityai/japanese-stablelm-base-gamma-7b)

### Configuration

The following YAML configuration was used to produce this model:

```yaml
merge_method: task_arithmetic
base_model: mistralai/Mistral-7B-v0.1
models:
  - model: tokyotech-llm/Swallow-MS-7b-v0.1
    parameters:
      weight:
      - filter: embed_tokens
        value: 1.0
      - value: 0
dtype: bfloat16
tokenizer_source: model:tokyotech-llm/Swallow-MS-7b-v0.1
name: Mistral-7B-v0.1-VE-Swallow-MS
---
merge_method: task_arithmetic
base_model: nitky/Flavor-7b # private model
models:
  - model: tokyotech-llm/Swallow-MS-7b-v0.1
    parameters:
      weight:
      - filter: embed_tokens
        value: 1.0
      - value: 0
dtype: bfloat16
tokenizer_source: model:tokyotech-llm/Swallow-MS-7b-v0.1
name: Flavor-7b-VE-Swallow-MS
---
merge_method: task_arithmetic
base_model: stabilityai/japanese-stablelm-base-gamma-7b
models:
  - model: tokyotech-llm/Swallow-MS-7b-v0.1
    parameters:
      weight:
      - filter: embed_tokens
        value: 1.0
      - value: 0
dtype: bfloat16
tokenizer_source: model:tokyotech-llm/Swallow-MS-7b-v0.1
name: japanese-stablelm-base-gamma-7b-VE-Swallow-MS
---
merge_method: task_arithmetic
base_model: Mistral-7B-v0.1-VE-Swallow-MS
models:
  - model: tokyotech-llm/Swallow-MS-7b-v0.1
    parameters:
      weight: 1.0
  - model: Flavor-7b-VE-Swallow-MS
    parameters:
      weight: 0.5
  - model: japanese-stablelm-base-gamma-7b-VE-Swallow-MS
    parameters:
      weight: -0.5
dtype: bfloat16
name: Oumuamua-7b-base-preset
---
merge_method: model_stock
base_model: Mistral-7B-v0.1-VE-Swallow-MS
models:
  - model: tokyotech-llm/Swallow-MS-7b-v0.1
  - model: Oumuamua-7b-base-preset
dtype: bfloat16
name: Oumuamua-7b-base
```