aiqwe
/

gemma-2b-it-example-v1

@@ -9,51 +9,117 @@ base_model: google/gemma-1.1-2b-it
 model-index:
 - name: gemma-2b-it-example-v1
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# gemma-2b-it-example-v1
-This model is a fine-tuned version of [google/gemma-1.1-2b-it](https://huggingface.co/google/gemma-1.1-2b-it) on an unknown dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0001
-- train_batch_size: 4
-- eval_batch_size: 8
-- seed: 42
-- gradient_accumulation_steps: 2
-- total_train_batch_size: 8
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.05
-- num_epochs: 5
-### Training results
-### Framework versions
-- PEFT 0.10.0
-- Transformers 4.40.2
-- Pytorch 2.2.1+cu121
-- Datasets 2.19.1
-- Tokenizers 0.19.1

 model-index:
 - name: gemma-2b-it-example-v1
   results: []
+language:
+- ko
 ---
+## Model Description
+**git hub** : [https://github.com/aiqwe/instruction-tuning-with-rag-example](https://github.com/aiqwe/instruction-tuning-with-rag-example)
+Instruction Tuning의 학습을 위해 예시로 학습한 모델입니다.
+[gemma-2b-it](https://huggingface.co/google/gemma-2b-it) 모델을 기반으로 약 1만개의 부동산 관련 Instruction 데이터셋을 학습하였습니다.
+학습 코드는 위 git hub를 참조해주세요.
+## Usage
+### Inference on GPU example
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
+model = AutoModelForCausalLM.from_pretrained(
+    "aiqwe/gemma-2b-it-example-v1",
+    device_map="cuda",
+    torch_dtype=torch.bfloat16,
+    attn_implementation="flash_attention_2"
+)
+input_text = "아파트 재건축에 대해 알려줘."
+input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
+outputs = model.generate(**input_ids, max_new_tokens=512)
+print(tokenizer.decode(outputs[0]))
+```
+### Inference on CPU example
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
+model = AutoModelForCausalLM.from_pretrained(
+    "aiqwe/gemma-2b-it-example-v1",
+    device_map="cpu",
+    torch_dtype=torch.bfloat16
+)
+input_text = "아파트 재건축에 대해 알려줘."
+input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
+outputs = model.generate(**input_ids, max_new_tokens=512)
+print(tokenizer.decode(outputs[0]))
+```
+### Inference on GPU with embedded function example
+내장된 함수로 네이버 검색 API를 통해 RAG를 지원받습니다.
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from utils import generate
+tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
+model = AutoModelForCausalLM.from_pretrained(
+    "aiqwe/gemma-2b-it-example-v1",
+    device_map="cuda",
+    torch_dtype=torch.bfloat16,
+    attn_implementation="flash_attention_2"
+)
+rag_config = {
+    "api_client_id": userdata.get('NAVER_API_ID'),
+    "api_client_secret": userdata.get('NAVER_API_SECRET')
+}
+completion = generate(
+    model=model,
+    tokenizer=tokenizer,
+    query=query,
+    max_new_tokens=512,
+    rag=True,
+    rag_config=rag_config
+)
+print(completion)
+```
+## Chat Template
+Gemma 모델의 Chat Template을 사용합니다.
+[gemma-2b-it Chat Template](https://huggingface.co/google/gemma-2b-it#chat-template)
+```python
+input_text = "아파트 재건축에 대해 알려줘."
+input_text = tokenizer.apply_chat_template(
+        conversation=[
+            {"role": "user", "content": input_text}
+        ],
+        add_generate_prompt=True,
+        return_tensors="pt"
+    ).to(model.device)
+outputs = model.generate(input_text, max_new_tokens=512, repetition_penalty = 1.5)
+print(tokenizer.decode(outputs[0], skip_special_tokens=False))
+```
+## Training information
+학습은 구글 코랩 L4 Single GPU를 활용하였습니다.
+| 구분                          | 내용               |
+|-----------------------------|------------------|
+| 환경                          | Google Colab     |
+| GPU                         | L4(22.5GB)       |
+| 사용 VRAM                     | 약 13.8GB         |
+| dtype                       | bfloat16         |
+| Attention                   | flash attention2 |
+| Tuning                      | Lora(r=4, alpha=32) |
+| Learning Rate               | 1e-4             |
+| LRScheduler                 | Cosine           |
+| Optimizer                   | adamw_torch_fused |
+| batch_size                  | 4                |
+| gradient_accumulation_steps | 2                |