mav23 commited on
Commit
230ab87
β€’
1 Parent(s): f8d4425

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ llama-3-korean-bllossom-8b.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,335 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - meta-llama/Meta-Llama-3-8B
4
+ language:
5
+ - en
6
+ - ko
7
+ library_name: transformers
8
+ license: llama3
9
+ ---
10
+
11
+ <a href="https://github.com/MLP-Lab/Bllossom">
12
+ <img src="https://github.com/teddysum/bllossom/blob/main//bllossom_icon.png?raw=true" width="40%" height="50%">
13
+ </a>
14
+
15
+
16
+
17
+ # Update!
18
+ * ~~[2024.08.09] Llama3.1 버전을 κΈ°λ°˜μœΌλ‘œν•œ Bllossom-8B둜 λͺ¨λΈμ„ μ—…λ°μ΄νŠΈ ν–ˆμŠ΅λ‹ˆλ‹€. κΈ°μ‘΄ llama3기반 Bllossom 보닀 평균 5%정도 μ„±λŠ₯ ν–₯상이 μžˆμ—ˆμŠ΅λ‹ˆλ‹€.~~(μˆ˜μ •μ€‘μ— μžˆμŠ΅λ‹ˆλ‹€.)
19
+ * [2024.06.18] μ‚¬μ „ν•™μŠ΅λŸ‰μ„ **250GB**κΉŒμ§€ 늘린 Bllossom ELOλͺ¨λΈλ‘œ μ—…λ°μ΄νŠΈ λ˜μ—ˆμŠ΅λ‹ˆλ‹€. λ‹€λ§Œ 단어확μž₯은 ν•˜μ§€ μ•Šμ•˜μŠ΅λ‹ˆλ‹€. κΈ°μ‘΄ 단어확μž₯된 long-context λͺ¨λΈμ„ ν™œμš©ν•˜κ³  μ‹ΆμœΌμ‹ λΆ„μ€ κ°œμΈμ—°λ½μ£Όμ„Έμš”!
20
+ * [2024.06.18] Bllossom ELO λͺ¨λΈμ€ 자체 κ°œλ°œν•œ ELOμ‚¬μ „ν•™μŠ΅ 기반으둜 μƒˆλ‘œμš΄ ν•™μŠ΅λœ λͺ¨λΈμž…λ‹ˆλ‹€. [LogicKor](https://github.com/StableFluffy/LogicKor) 벀치마크 κ²°κ³Ό ν˜„μ‘΄ν•˜λŠ” ν•œκ΅­μ–΄ 10Bμ΄ν•˜ λͺ¨λΈμ€‘ SOTA점수λ₯Ό λ°›μ•˜μŠ΅λ‹ˆλ‹€.
21
+
22
+ LogicKor μ„±λŠ₯ν‘œ :
23
+ | Model | Math | Reasoning | Writing | Coding | Understanding | Grammar | Single ALL | Multi ALL | Overall |
24
+ |:---------:|:-----:|:------:|:-----:|:-----:|:----:|:-----:|:-----:|:-----:|:----:|
25
+ | gpt-3.5-turbo-0125 | 7.14 | 7.71 | 8.28 | 5.85 | 9.71 | 6.28 | 7.50 | 7.95 | 7.72 |
26
+ | gemini-1.5-pro-preview-0215 | 8.00 | 7.85 | 8.14 | 7.71 | 8.42 | 7.28 | 7.90 | 6.26 | 7.08 |
27
+ | llama-3-Korean-Bllossom-8B | 5.43 | 8.29 | 9.0 | 4.43 | 7.57 | 6.86 | 6.93 | 6.93 | 6.93 |
28
+
29
+
30
+
31
+ # Bllossom | [Demo]() | [Homepage](https://www.bllossom.ai/) | [Github](https://github.com/MLP-Lab/Bllossom) |
32
+
33
+ <!-- [GPU용 Colab μ½”λ“œμ˜ˆμ œ](https://colab.research.google.com/drive/1fBOzUVZ6NRKk_ugeoTbAOokWKqSN47IG?usp=sharing) | -->
34
+ <!-- [CPU용 Colab μ–‘μžν™”λͺ¨λΈ μ½”λ“œμ˜ˆμ œ](https://colab.research.google.com/drive/129ZNVg5R2NPghUEFHKF0BRdxsZxinQcJ?usp=drive_link) -->
35
+
36
+ ```bash
37
+ 저희 BllossomνŒ€ μ—μ„œ ν•œκ΅­μ–΄-μ˜μ–΄ 이쀑 μ–Έμ–΄λͺ¨λΈμΈ Bllossom을 κ³΅κ°œν–ˆμŠ΅λ‹ˆλ‹€!
38
+ μ„œμšΈκ³ΌκΈ°λŒ€ μŠˆνΌμ»΄ν“¨νŒ… μ„Όν„°μ˜ μ§€μ›μœΌλ‘œ 100GBκ°€λ„˜λŠ” ν•œκ΅­μ–΄λ‘œ λͺ¨λΈμ „체λ₯Ό ν’€νŠœλ‹ν•œ ν•œκ΅­μ–΄ κ°•ν™” 이쀑언어 λͺ¨λΈμž…λ‹ˆλ‹€!
39
+ ν•œκ΅­μ–΄ μž˜ν•˜λŠ” λͺ¨λΈ μ°Ύκ³  μžˆμ§€ μ•ŠμœΌμ…¨λ‚˜μš”?
40
+ - ν•œκ΅­μ–΄ 졜초! 무렀 3λ§Œκ°œκ°€ λ„˜λŠ” ν•œκ΅­μ–΄ μ–΄νœ˜ν™•μž₯
41
+ - Llama3λŒ€λΉ„ λŒ€λž΅ 25% 더 κΈ΄ 길이의 ν•œκ΅­μ–΄ Context μ²˜λ¦¬κ°€λŠ₯
42
+ - ν•œκ΅­μ–΄-μ˜μ–΄ Pararell Corpusλ₯Ό ν™œμš©ν•œ ν•œκ΅­μ–΄-μ˜μ–΄ 지식연결 (μ‚¬μ „ν•™μŠ΅)
43
+ - ν•œκ΅­μ–΄ λ¬Έν™”, μ–Έμ–΄λ₯Ό κ³ λ €ν•΄ μ–Έμ–΄ν•™μžκ°€ μ œμž‘ν•œ 데이터λ₯Ό ν™œμš©ν•œ λ―Έμ„Έμ‘°μ •
44
+ - κ°•ν™”ν•™μŠ΅
45
+ 이 λͺ¨λ“ κ²Œ ν•œκΊΌλ²ˆμ— 적용되고 상업적 이용이 κ°€λŠ₯ν•œ Bllossom을 μ΄μš©ν•΄ μ—¬λŸ¬λΆ„ 만의 λͺ¨λΈμ„ λ§Œλ“€μ–΄λ³΄μ„Έμš₯!
46
+ 무렀 Colab 무료 GPU둜 ν•™μŠ΅μ΄ κ°€λŠ₯ν•©λ‹ˆλ‹€. ν˜Ήμ€ μ–‘μžν™” λͺ¨λΈλ‘œ CPUμ—μ˜¬λ €λ³΄μ„Έμš” [μ–‘μžν™”λͺ¨λΈ](https://huggingface.co/MLP-KTLim/llama-3-Korean-Bllossom-8B-4bit)
47
+
48
+ 1. Bllossom-8BλŠ” μ„œμšΈκ³ΌκΈ°λŒ€, ν…Œλ””μΈ, μ—°μ„ΈλŒ€ μ–Έμ–΄μžμ› μ—°κ΅¬μ‹€μ˜ μ–Έμ–΄ν•™μžμ™€ ν˜‘μ—…ν•΄ λ§Œλ“  μ‹€μš©μ£Όμ˜κΈ°λ°˜ μ–Έμ–΄λͺ¨λΈμž…λ‹ˆλ‹€! μ•žμœΌλ‘œ 지속적인 μ—…λ°μ΄νŠΈλ₯Ό 톡해 κ΄€λ¦¬ν•˜κ² μŠ΅λ‹ˆλ‹€ 많이 ν™œμš©ν•΄μ£Όμ„Έμš” πŸ™‚
49
+ 2. 초 κ°•λ ₯ν•œ Advanced-Bllossom 8B, 70Bλͺ¨λΈ, μ‹œκ°-μ–Έμ–΄λͺ¨λΈμ„ λ³΄μœ ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€! (κΆκΈˆν•˜μ‹ λΆ„μ€ κ°œλ³„ μ—°λ½μ£Όμ„Έμš”!!)
50
+ 3. Bllossom은 NAACL2024, LREC-COLING2024 (ꡬ두) λ°œν‘œλ‘œ μ±„νƒλ˜μ—ˆμŠ΅λ‹ˆλ‹€.
51
+ 4. 쒋은 μ–Έμ–΄λͺ¨λΈ 계속 μ—…λ°μ΄νŠΈ ν•˜κ² μŠ΅λ‹ˆλ‹€!! ν•œκ΅­μ–΄ κ°•ν™”λ₯Όμœ„ν•΄ 곡동 μ—°κ΅¬ν•˜μ‹€λΆ„(νŠΉνžˆλ…Όλ¬Έ) μ–Έμ œλ“  ν™˜μ˜ν•©λ‹ˆλ‹€!!
52
+ 특히 μ†ŒλŸ‰μ˜ GPU라도 λŒ€μ—¬ κ°€λŠ₯ν•œνŒ€μ€ μ–Έμ œλ“  μ—°λ½μ£Όμ„Έμš”! λ§Œλ“€κ³  싢은거 λ„μ™€λ“œλ €μš”.
53
+ ```
54
+
55
+ The Bllossom language model is a Korean-English bilingual language model based on the open-source LLama3. It enhances the connection of knowledge between Korean and English. It has the following features:
56
+
57
+ * **Knowledge Linking**: Linking Korean and English knowledge through additional training
58
+ * **Vocabulary Expansion**: Expansion of Korean vocabulary to enhance Korean expressiveness.
59
+ * **Instruction Tuning**: Tuning using custom-made instruction following data specialized for Korean language and Korean culture
60
+ * **Human Feedback**: DPO has been applied
61
+ * **Vision-Language Alignment**: Aligning the vision transformer with this language model
62
+
63
+ **This model developed by [MLPLab at Seoultech](http://mlp.seoultech.ac.kr), [Teddysum](http://teddysum.ai/) and [Yonsei Univ](https://sites.google.com/view/hansaemkim/hansaem-kim)**
64
+
65
+ ## Demo Video
66
+
67
+ <div style="display: flex; justify-content: space-between;">
68
+ <!-- 첫 번째 컬럼 -->
69
+ <div style="width: 49%;">
70
+ <a>
71
+ <img src="https://github.com/lhsstn/lhsstn/blob/main/x-llava_dem.gif?raw=true" style="width: 100%; height: auto;">
72
+ </a>
73
+ <p style="text-align: center;">Bllossom-V Demo</p>
74
+ </div>
75
+
76
+ <!-- 두 번째 컬럼 (ν•„μš”ν•˜λ‹€λ©΄) -->
77
+ <div style="width: 49%;">
78
+ <a>
79
+ <img src="https://github.com/lhsstn/lhsstn/blob/main/bllossom_demo_kakao.gif?raw=true" style="width: 70%; height: auto;">
80
+ </a>
81
+ <p style="text-align: center;">Bllossom Demo(Kakao)γ…€γ…€γ…€γ…€γ…€γ…€γ…€γ…€</p>
82
+ </div>
83
+ </div>
84
+
85
+
86
+
87
+ # NEWS
88
+ * [2024.06.18] We have reverted to the non-vocab-expansion model. However, we have significantly increased the amount of pre-training data to 250GB.
89
+ * [2024.05.08] Vocab Expansion Model Update
90
+ * [2024.04.25] We released Bllossom v2.0, based on llama-3
91
+
92
+ ## Example code
93
+
94
+ ### Colab Tutorial
95
+ - [Inference-Code-Link](https://colab.research.google.com/drive/1fBOzUVZ6NRKk_ugeoTbAOokWKqSN47IG?usp=sharing)
96
+
97
+ ### Install Dependencies
98
+ ```bash
99
+ pip install torch transformers==4.40.0 accelerate
100
+ ```
101
+
102
+ ### Python code with Pipeline
103
+ ```python
104
+ import transformers
105
+ import torch
106
+
107
+ model_id = "MLP-KTLim/llama-3-Korean-Bllossom-8B"
108
+
109
+ pipeline = transformers.pipeline(
110
+ "text-generation",
111
+ model=model_id,
112
+ model_kwargs={"torch_dtype": torch.bfloat16},
113
+ device_map="auto",
114
+ )
115
+
116
+ pipeline.model.eval()
117
+
118
+ PROMPT = '''You are a helpful AI assistant. Please answer the user's questions kindly. 당신은 유λŠ₯ν•œ AI μ–΄μ‹œμŠ€ν„΄νŠΈ μž…λ‹ˆλ‹€. μ‚¬μš©μžμ˜ μ§ˆλ¬Έμ— λŒ€ν•΄ μΉœμ ˆν•˜κ²Œ λ‹΅λ³€ν•΄μ£Όμ„Έμš”.'''
119
+ instruction = "μ„œμšΈμ˜ 유λͺ…ν•œ κ΄€κ΄‘ μ½”μŠ€λ₯Ό λ§Œλ“€μ–΄μ€„λž˜?"
120
+
121
+ messages = [
122
+ {"role": "system", "content": f"{PROMPT}"},
123
+ {"role": "user", "content": f"{instruction}"}
124
+ ]
125
+
126
+ prompt = pipeline.tokenizer.apply_chat_template(
127
+ messages,
128
+ tokenize=False,
129
+ add_generation_prompt=True
130
+ )
131
+
132
+ terminators = [
133
+ pipeline.tokenizer.eos_token_id,
134
+ pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
135
+ ]
136
+
137
+ outputs = pipeline(
138
+ prompt,
139
+ max_new_tokens=2048,
140
+ eos_token_id=terminators,
141
+ do_sample=True,
142
+ temperature=0.6,
143
+ top_p=0.9
144
+ )
145
+
146
+ print(outputs[0]["generated_text"][len(prompt):])
147
+ ```
148
+ ```
149
+ # 물둠이죠! μ„œμšΈμ€ λ‹€μ–‘ν•œ 문화와 역사, μžμ—°μ„ κ²ΈλΉ„ν•œ λ„μ‹œλ‘œ, λ§Žμ€ κ΄€κ΄‘ λͺ…μ†Œλ₯Ό μžλž‘ν•©λ‹ˆλ‹€. μ—¬κΈ° μ„œμšΈμ˜ 유λͺ…ν•œ κ΄€κ΄‘ μ½”μŠ€λ₯Ό μ†Œκ°œν•΄ λ“œλ¦΄κ²Œμš”.
150
+
151
+ ### μ½”μŠ€ 1: 역사와 λ¬Έν™” 탐방
152
+
153
+ 1. **경볡ꢁ**
154
+ - μ„œμšΈμ˜ λŒ€ν‘œμ μΈ ꢁꢐ둜, μ‘°μ„  μ™•μ‘°μ˜ 역사와 λ¬Έν™”λ₯Ό μ²΄ν—˜ν•  수 μžˆλŠ” κ³³μž…λ‹ˆλ‹€.
155
+
156
+ 2. **뢁촌 ν•œμ˜₯λ§ˆμ„**
157
+ - 전톡 ν•œμ˜₯이 잘 보쑴된 λ§ˆμ„λ‘œ, μ‘°μ„ μ‹œλŒ€μ˜ μƒν™œμƒμ„ λŠλ‚„ 수 μžˆμŠ΅λ‹ˆλ‹€.
158
+
159
+ 3. **인사동**
160
+ - 전톡 문화와 ν˜„λŒ€ 예술이 κ³΅μ‘΄ν•˜λŠ” 거리둜, λ‹€μ–‘ν•œ κ°€λŸ¬λ¦¬μ™€ 전톡 μŒμ‹μ μ΄ μžˆμŠ΅λ‹ˆλ‹€.
161
+
162
+ 4. **μ²­κ³„μ²œ**
163
+ - μ„œμšΈμ˜ 쀑심에 μœ„μΉ˜ν•œ 천문으둜, μ‘°κΉ…κ³Ό 산책을 즐길 수 μžˆλŠ” κ³³μž…λ‹ˆλ‹€.
164
+
165
+ ### μ½”μŠ€ 2: μžμ—°κ³Ό μ‡Όν•‘
166
+
167
+ 1. **남산 μ„œμšΈνƒ€μ›Œ**
168
+ - μ„œμšΈμ˜ 전경을 ν•œλˆˆμ— λ³Ό 수 μžˆλŠ” 곳으둜, 특히 저녁 μ‹œκ°„λŒ€μ— 일λͺ°μ„ κ°μƒν•˜λŠ” 것이 μ’‹μŠ΅λ‹ˆλ‹€.
169
+
170
+ 2. **λͺ…동**
171
+ - μ‡Όν•‘κ³Ό μŒμ‹μ μ΄ μ¦λΉ„ν•œ μ§€μ—­μœΌλ‘œ, λ‹€μ–‘ν•œ λΈŒλžœλ“œμ™€ 전톡 μŒμ‹μ„ 맛볼 수 μžˆμŠ΅λ‹ˆλ‹€.
172
+
173
+ 3. **ν•œκ°•κ³΅μ›**
174
+ - μ„œμšΈμ˜ μ£Όμš” 곡원 쀑 ν•˜λ‚˜λ‘œ, μ‘°κΉ…, μžμ „κ±° 타기, λ°°λ‚­ 여행을 즐길 수 μžˆμŠ΅λ‹ˆλ‹€.
175
+
176
+ 4. **ν™λŒ€**
177
+ - μ Šμ€μ΄λ“€μ΄ 즐겨 μ°ΎλŠ” μ§€μ—­μœΌλ‘œ, λ‹€μ–‘ν•œ 카페, λ ˆμŠ€ν† λž‘, 클럽이 μžˆμŠ΅λ‹ˆλ‹€.
178
+
179
+ ### μ½”μŠ€ 3: ν˜„λŒ€μ™€ μ „ν†΅μ˜ μ‘°ν™”
180
+
181
+ 1. **λ™λŒ€λ¬Έ λ””μžμΈ ν”ŒλΌμž (DDP)**
182
+ - ν˜„λŒ€μ μΈ κ±΄μΆ•λ¬Όλ‘œ, λ‹€μ–‘ν•œ μ „μ‹œμ™€ μ΄λ²€νŠΈκ°€ μ—΄λ¦¬λŠ” κ³³μž…λ‹ˆλ‹€.
183
+
184
+ 2. **μ΄νƒœμ›**
185
+ - λ‹€μ–‘ν•œ ꡭ제 μŒμ‹κ³Ό μΉ΄νŽ˜κ°€ μžˆλŠ” μ§€μ—­μœΌλ‘œ, λ‹€μ–‘ν•œ λ¬Έν™”λ₯Ό κ²½ν—˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
186
+
187
+ 3. **κ΄‘ν™”λ¬Έ**
188
+ - μ„œμšΈμ˜ 쀑심에 μœ„μΉ˜ν•œ κ΄‘μž₯으둜, λ‹€μ–‘ν•œ 곡연과 행사가 μ—΄λ¦½λ‹ˆλ‹€.
189
+
190
+ 4. **μ„œμšΈλžœλ“œ**
191
+ - μ„œμšΈ 외곽에 μœ„μΉ˜ν•œ ν…Œλ§ˆνŒŒν¬λ‘œ, κ°€μ‘±λ‹¨μœ„ κ΄€κ΄‘κ°λ“€μ—κ²Œ 인기 μžˆλŠ” κ³³μž…λ‹ˆλ‹€.
192
+
193
+ 이 μ½”μŠ€λ“€μ€ μ„œμšΈμ˜ λ‹€μ–‘ν•œ λ©΄λͺ¨λ₯Ό κ²½ν—˜ν•  수 μžˆλ„λ‘ κ΅¬μ„±λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€. 각 μ½”μŠ€λ§ˆλ‹€ μ‹œκ°„μ„ μ‘°μ ˆν•˜κ³ , 개인의 관심사에 맞게 μ„ νƒν•˜μ—¬ λ°©λ¬Έν•˜λ©΄ 쒋을 것 κ°™μŠ΅λ‹ˆλ‹€. 즐거운 μ—¬ν–‰ λ˜μ„Έμš”!
194
+ ```
195
+
196
+ ### Python code with AutoModel
197
+ ```python
198
+
199
+ import os
200
+ import torch
201
+ from transformers import AutoTokenizer, AutoModelForCausalLM
202
+
203
+ model_id = 'MLP-KTLim/llama-3-Korean-Bllossom-8B'
204
+
205
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
206
+ model = AutoModelForCausalLM.from_pretrained(
207
+ model_id,
208
+ torch_dtype=torch.bfloat16,
209
+ device_map="auto",
210
+ )
211
+
212
+ model.eval()
213
+
214
+ PROMPT = '''You are a helpful AI assistant. Please answer the user's questions kindly. 당신은 유λŠ₯ν•œ AI μ–΄μ‹œμŠ€ν„΄νŠΈ μž…λ‹ˆλ‹€. μ‚¬μš©μžμ˜ μ§ˆλ¬Έμ— λŒ€ν•΄ μΉœμ ˆν•˜κ²Œ λ‹΅λ³€ν•΄μ£Όμ„Έμš”.'''
215
+ instruction = "μ„œμšΈμ˜ 유λͺ…ν•œ κ΄€κ΄‘ μ½”μŠ€λ₯Ό λ§Œλ“€μ–΄μ€„λž˜?"
216
+
217
+ messages = [
218
+ {"role": "system", "content": f"{PROMPT}"},
219
+ {"role": "user", "content": f"{instruction}"}
220
+ ]
221
+
222
+ input_ids = tokenizer.apply_chat_template(
223
+ messages,
224
+ add_generation_prompt=True,
225
+ return_tensors="pt"
226
+ ).to(model.device)
227
+
228
+ terminators = [
229
+ tokenizer.eos_token_id,
230
+ tokenizer.convert_tokens_to_ids("<|eot_id|>")
231
+ ]
232
+
233
+ outputs = model.generate(
234
+ input_ids,
235
+ max_new_tokens=2048,
236
+ eos_token_id=terminators,
237
+ do_sample=True,
238
+ temperature=0.6,
239
+ top_p=0.9
240
+ )
241
+
242
+ print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))
243
+ ```
244
+ ```
245
+ # 물둠이죠! μ„œμšΈμ€ λ‹€μ–‘ν•œ 문화와 역사, μžμ—°μ„ κ²ΈλΉ„ν•œ λ„μ‹œλ‘œ, λ§Žμ€ κ΄€κ΄‘ λͺ…μ†Œλ₯Ό μžλž‘ν•©λ‹ˆλ‹€. μ—¬κΈ° μ„œμšΈμ˜ 유λͺ…ν•œ κ΄€κ΄‘ μ½”μŠ€λ₯Ό μ†Œκ°œν•΄ λ“œλ¦΄κ²Œμš”.
246
+
247
+ ### μ½”μŠ€ 1: 역사와 λ¬Έν™” 탐방
248
+
249
+ 1. **경볡ꢁ**
250
+ - μ„œμšΈμ˜ λŒ€ν‘œμ μΈ ꢁꢐ둜, μ‘°μ„  μ™•μ‘°μ˜ 역사와 λ¬Έν™”λ₯Ό μ²΄ν—˜ν•  수 μžˆλŠ” κ³³μž…λ‹ˆλ‹€.
251
+
252
+ 2. **뢁촌 ν•œμ˜₯λ§ˆμ„**
253
+ - 전톡 ν•œμ˜₯이 잘 보쑴된 λ§ˆμ„λ‘œ, μ‘°μ„ μ‹œλŒ€μ˜ μƒν™œμƒμ„ λŠλ‚„ 수 μžˆμŠ΅λ‹ˆλ‹€.
254
+
255
+ 3. **인사동**
256
+ - 전톡 문화와 ν˜„λŒ€ 예술이 κ³΅μ‘΄ν•˜λŠ” 거리둜, λ‹€μ–‘ν•œ κ°€λŸ¬λ¦¬μ™€ 전톡 μŒμ‹μ μ΄ μžˆμŠ΅λ‹ˆλ‹€.
257
+
258
+ 4. **μ²­κ³„μ²œ**
259
+ - μ„œμšΈμ˜ 쀑심에 μœ„μΉ˜ν•œ 천문으둜, μ‘°κΉ…κ³Ό 산책을 즐길 수 μžˆλŠ” κ³³μž…λ‹ˆλ‹€.
260
+
261
+ ### μ½”μŠ€ 2: μžμ—°κ³Ό μ‡Όν•‘
262
+
263
+ 1. **남산 μ„œμšΈνƒ€μ›Œ**
264
+ - μ„œμšΈμ˜ 전경을 ν•œλˆˆμ— λ³Ό 수 μžˆλŠ” 곳으둜, 특히 저녁 μ‹œκ°„λŒ€μ— 일λͺ°μ„ κ°μƒν•˜λŠ” 것이 μ’‹μŠ΅λ‹ˆλ‹€.
265
+
266
+ 2. **λͺ…동**
267
+ - μ‡Όν•‘κ³Ό μŒμ‹μ μ΄ μ¦λΉ„ν•œ μ§€μ—­μœΌλ‘œ, λ‹€μ–‘ν•œ λΈŒλžœλ“œμ™€ 전톡 μŒμ‹μ„ 맛볼 수 μžˆμŠ΅λ‹ˆλ‹€.
268
+
269
+ 3. **ν•œκ°•κ³΅μ›**
270
+ - μ„œμšΈμ˜ μ£Όμš” 곡원 쀑 ν•˜λ‚˜λ‘œ, μ‘°κΉ…, μžμ „κ±° 타기, λ°°λ‚­ 여행을 즐길 수 μžˆμŠ΅λ‹ˆλ‹€.
271
+
272
+ 4. **ν™λŒ€**
273
+ - μ Šμ€μ΄λ“€μ΄ 즐겨 μ°ΎλŠ” μ§€μ—­μœΌλ‘œ, λ‹€μ–‘ν•œ 카페, λ ˆμŠ€ν† λž‘, 클럽이 μžˆμŠ΅λ‹ˆλ‹€.
274
+
275
+ ### μ½”μŠ€ 3: ν˜„λŒ€μ™€ μ „ν†΅μ˜ μ‘°ν™”
276
+
277
+ 1. **λ™λŒ€λ¬Έ λ””μžμΈ ν”ŒλΌμž (DDP)**
278
+ - ν˜„λŒ€μ μΈ κ±΄μΆ•λ¬Όλ‘œ, λ‹€μ–‘ν•œ μ „μ‹œμ™€ μ΄λ²€νŠΈκ°€ μ—΄λ¦¬λŠ” κ³³μž…λ‹ˆλ‹€.
279
+
280
+ 2. **μ΄νƒœμ›**
281
+ - λ‹€μ–‘ν•œ ꡭ제 μŒμ‹κ³Ό μΉ΄νŽ˜κ°€ μžˆλŠ” μ§€μ—­μœΌλ‘œ, λ‹€μ–‘ν•œ λ¬Έν™”λ₯Ό κ²½ν—˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
282
+
283
+ 3. **κ΄‘ν™”λ¬Έ**
284
+ - μ„œμšΈμ˜ 쀑심에 μœ„μΉ˜ν•œ κ΄‘μž₯으둜, λ‹€μ–‘ν•œ 곡연과 행사가 μ—΄λ¦½λ‹ˆλ‹€.
285
+
286
+ 4. **μ„œμšΈλžœλ“œ**
287
+ - μ„œμšΈ 외곽에 μœ„μΉ˜ν•œ ν…Œλ§ˆνŒŒν¬λ‘œ, κ°€μ‘±λ‹¨μœ„ κ΄€κ΄‘κ°λ“€μ—κ²Œ 인기 μžˆλŠ” κ³³μž…λ‹ˆλ‹€.
288
+
289
+ 이 μ½”μŠ€λ“€μ€ μ„œμšΈμ˜ λ‹€μ–‘ν•œ λ©΄λͺ¨λ₯Ό κ²½ν—˜ν•  수 μžˆλ„λ‘ κ΅¬μ„±λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€. 각 μ½”μŠ€λ§ˆλ‹€ μ‹œκ°„μ„ μ‘°μ ˆν•˜κ³ , 개인의 관심사에 맞게 μ„ νƒν•˜μ—¬ λ°©λ¬Έν•˜λ©΄ 쒋을 것 κ°™μŠ΅λ‹ˆλ‹€. 즐거운 μ—¬ν–‰ λ˜μ„Έμš”!
290
+ ```
291
+
292
+
293
+
294
+ ## Citation
295
+ **Language Model**
296
+ ```text
297
+ @misc{bllossom,
298
+ author = {ChangSu Choi, Yongbin Jeong, Seoyoon Park, InHo Won, HyeonSeok Lim, SangMin Kim, Yejee Kang, Chanhyuk Yoon, Jaewan Park, Yiseul Lee, HyeJin Lee, Younggyun Hahm, Hansaem Kim, KyungTae Lim},
299
+ title = {Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean},
300
+ year = {2024},
301
+ journal = {LREC-COLING 2024},
302
+ paperLink = {\url{https://arxiv.org/pdf/2403.10882}},
303
+ },
304
+ }
305
+ ```
306
+
307
+ **Vision-Language Model**
308
+ ```text
309
+ @misc{bllossom-V,
310
+ author = {Dongjae Shin, Hyunseok Lim, Inho Won, Changsu Choi, Minjun Kim, Seungwoo Song, Hangyeol Yoo, Sangmin Kim, Kyungtae Lim},
311
+ title = {X-LLaVA: Optimizing Bilingual Large Vision-Language Alignment},
312
+ year = {2024},
313
+ publisher = {GitHub},
314
+ journal = {NAACL 2024 findings},
315
+ paperLink = {\url{https://arxiv.org/pdf/2403.11399}},
316
+ },
317
+ }
318
+ ```
319
+
320
+ ## Contact
321
+ - μž„κ²½νƒœ(KyungTae Lim), Professor at Seoultech. `ktlim@seoultech.ac.kr`
322
+ - ν•¨μ˜κ· (Younggyun Hahm), CEO of Teddysum. `hahmyg@teddysum.ai`
323
+ - κΉ€ν•œμƒ˜(Hansaem Kim), Professor at Yonsei. `khss@yonsei.ac.kr`
324
+
325
+ ## Contributor
326
+ - 졜창수(Chansu Choi), choics2623@seoultech.ac.kr
327
+ - 김상민(Sangmin Kim), sangmin9708@naver.com
328
+ - μ›μΈν˜Έ(Inho Won), wih1226@seoultech.ac.kr
329
+ - κΉ€λ―Όμ€€(Minjun Kim), mjkmain@seoultech.ac.kr
330
+ - μ†‘μŠΉμš°(Seungwoo Song), sswoo@seoultech.ac.kr
331
+ - μ‹ λ™μž¬(Dongjae Shin), dylan1998@seoultech.ac.kr
332
+ - μž„ν˜„μ„(Hyeonseok Lim), gustjrantk@seoultech.ac.kr
333
+ - μœ‘μ •ν›ˆ(Jeonghun Yuk), usually670@gmail.com
334
+ - μœ ν•œκ²°(Hangyeol Yoo), 21102372@seoultech.ac.kr
335
+ - μ†‘μ„œν˜„(Seohyun Song), alexalex225225@gmail.com
llama-3-korean-bllossom-8b.Q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9aac8669590d05ab4236db639cd1320e7eeec65f3723d5c73654f3fed9e65713
3
+ size 4661212064