File size: 3,977 Bytes
19031c8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5094420
19031c8
 
fb6744d
9830d9a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7a5d135
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9830d9a
 
 
 
 
7a5d135
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
license: llama3
language:
- ko
tags:
- korean
- llama3
- instruction-tuning
- dora
datasets:
- Acyrl
- llm-kr-eval
- Counter-MT-bench
base_model:
- meta-llama/Meta-Llama-3-8B
pipeline_tag: text-generation
---

# A-LLM: Korean Language Model based on Llama-3

## Introduction
A-LLM is a Korean language model built on Meta's Llama-3-8B architecture, specifically optimized for Korean language understanding and generation. The model was trained using the DoRA (Weight-Decomposed Low-Rank Adaptation) methodology on a comprehensive Korean dataset, achieving state-of-the-art performance among open-source Korean language models.


## Performance Benchmarks
### Horangi Korean LLM Leaderboard
The model's performance was evaluated using the Horangi Korean LLM Leaderboard
, which combines two major evaluation frameworks normalized to a 1.0 scale and averages their scores.

#### 1. LLM-KR-EVAL
A comprehensive benchmark that measures fundamental NLP capabilities across 5 core tasks:
- Natural Language Inference (NLI)
- Question Answering (QA)
- Reading Comprehension (RC)
- Entity Linking (EL)
- Fundamental Analysis (FA)

The benchmark comprises 10 different datasets distributed across these tasks, providing a thorough assessment of Korean language understanding and processing capabilities.

#### 2. MT-Bench
A diverse evaluation framework consisting of 80 questions (10 questions each from 8 categories), evaluated using GPT-4 as the judge. Categories include:
- Writing
- Roleplay
- Extraction
- Reasoning
- Math
- Coding
- Knowledge (STEM)
- Knowledge (Humanities/social science)

### Performance Results

| Model | Total Score | AVG_llm_kr_eval | AVG_mtbench |
|-------|-------------|-----------------|-------------|
| A-LLM (Ours) | 0.6675 | 0.5937 | 7.413 |
| GPT-4 | 0.7363 | 0.6158 | 8.569 |
| Mixtral-8x7B | 0.5843 | 0.4304 | 7.381 |
| KULLM3 | 0.5764 | 0.5204 | 6.325 |
| SOLAR-1-mini | 0.5173 | 0.37 | 6.647 |

Our model achieves state-of-the-art performance among open-source Korean language models, demonstrating strong capabilities across both general language understanding (LLM-KR-EVAL) and diverse task-specific applications (MT-Bench).

### Model Components
This repository provides:
- Tokenizer configuration
- Model weights in safetensor format

## Usage Instructions
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load tokenizer and model
model_path = "JinValentine/Jonathan-aLLM-Meta-Llama-3-8B-Instruct-Korean-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Example prompt template
def generate_prompt(instruction: str, context: str = None) -> str:
    if context:
        return f"""### Instruction:
{instruction}

### Context:
{context}

### Response:"""
    else:
        return f"""### Instruction:
{instruction}

### Response:"""

# Example usage
instruction = "๋‹ค์Œ ์งˆ๋ฌธ์— ๋‹ต๋ณ€ํ•ด์ฃผ์„ธ์š”: ์ธ๊ณต์ง€๋Šฅ์˜ ๋ฐœ์ „์ด ์šฐ๋ฆฌ ์‚ฌํšŒ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์€ ๋ฌด์—‡์ผ๊นŒ์š”?"
prompt = generate_prompt(instruction)

# Tokenize and generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_length=512,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.2,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

### Generation Settings
```python
# generation parameters
generation_config = {
  "bos_token_id": 128000,
  "do_sample": True,
  "eos_token_id": 128001,
  "max_length": 4096,
  "temperature": 0.6,
  "top_p": 0.9,
  "transformers_version": "4.40.1"
}

# Generate with specific config
outputs = model.generate(
    **inputs,
    **generation_config
)
```

### Prerequisites
- Python 3.8 or higher
- PyTorch 2.0 or higher
- Transformers library

### License
Please refer to the model card on HuggingFace for licensing information.