versae commited on
Commit
6fa2622
1 Parent(s): 3bbcfe1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +117 -1
README.md CHANGED
@@ -1,3 +1,119 @@
1
  ---
2
- license: cc-by-4.0
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: openrail
3
+ datasets:
4
+ - NbAiLab/norwegian-alpaca
5
+ language:
6
+ - 'no'
7
+ - nb
8
+ pipeline_tag: text-generation
9
  ---
10
+
11
+ # NB-Alpaca-LoRA 7B
12
+
13
+ This is an Norwegian adapter generated by fine-tuning LLaMA-7B on a [Norwegian Alpaca](https://huggingface.co/datasets/NbAiLab/norwegian-alpaca) dataset.
14
+
15
+ ## Usage
16
+
17
+ ```python
18
+ from peft import PeftModel
19
+ from transformers import LLaMATokenizer, LLaMAForCausalLM
20
+
21
+ base_model = "decapoda-research/llama-7b-hf"
22
+ tokenizer = LLaMATokenizer.from_pretrained(base_model)
23
+ model = LLaMAForCausalLM.from_pretrained(
24
+ base_model,
25
+ load_in_8bit=True,
26
+ device_map="auto",
27
+ )
28
+ model = PeftModel.from_pretrained(model, "NbAiLab/nb-alpaca-lora-7b")
29
+ ```
30
+
31
+ For generation, the promtp still needs the English template:
32
+
33
+ ```python
34
+ from transformers import pipeline
35
+
36
+ pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
37
+ instruction = "Skriv en e-post der du ønsker velkommen til en ny medarbeider ved navn Svein"
38
+ pipe.generate(f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
39
+
40
+ ### Instruction:
41
+ {instruction}
42
+
43
+ ### Response:
44
+ """)
45
+ # Kjære Svein,
46
+ #
47
+ # Velkommen til vårt team! Vi er så glade for å ha deg med oss. Vi ser frem til å hjelpe deg med å nå dine mål og oppnå dine drømmer.
48
+ #
49
+ # Vi er alltid tilgjengelige hvis du har noen spørsmål eller ønsker å diskutere noen av våre prosjekter.
50
+ #
51
+ # Vi ser frem til å jobbe sammen med deg!
52
+ #
53
+ # Med vennlig
54
+ ```
55
+
56
+
57
+ ## Data
58
+
59
+ The dataset is a translation to Norwegian Bokmål of [alpaca_data_cleaned.json](https://github.com/tloen/alpaca-lora/blob/main/alpaca_data_cleaned.json) (a clean version of the [Alpaca dataset made at Stanford](https://huggingface.co/datasets/tatsu-lab/alpaca)) using OpenAI's `gpt-3.5-turbo` model. We translated using a full-sample prompt instead of per strings, which resulted in more coherent tuples of `(intruction, input, output)` and costed around $60.0.
60
+
61
+ **This dataset cannot be used to create models that compete in any way with OpenAI.**
62
+
63
+ ## Finetuning
64
+
65
+ To fine-tune the LLaMA model we used the code available on [Alpaca Lora](https://github.com/tloen/alpaca-lora), which provides code to finetune the LLaMA model using PEFT from Hugging Face. We run finetuning for 3 epochs using sequence length of 512 with no gradient accumulation on a single A6000 with 48GB VRAM for 12 hours on top of LLaMA-7B.
66
+
67
+ ## Example outputs
68
+
69
+ Stanford Alpaca:
70
+ ```
71
+ Instruction: Kom med en kreativ unnskyldning for å si at jeg ikke trenger å gå på festen.
72
+ Response: I'm sorry, but I don't need to go to the party.
73
+ ```
74
+
75
+ NB-Alpaca:
76
+ ```
77
+ Instruction: Kom med en kreativ unnskyldning for å si at jeg ikke trenger å gå på festen.
78
+ Response: Jeg beklager at jeg ikke kan gå på festen, men jeg tror at det er bedre at jeg ikke gjør det.
79
+ ```
80
+
81
+
82
+ Stanford Alpaca:
83
+ ```
84
+ Instruction: Hvordan kan jeg starte en karriere som dataforsker? Skriv i listeform.
85
+ Response:
86
+ Starting a career as a data scientist requires a combination of education, experience, and skills.
87
+
88
+ 1. Obtain a Bachelor's or Master's degree in computer science, statistics, mathematics, or a related field.
89
+ 2. Gain practical experience through internships or part-time jobs.
90
+ 3. Learn programming languages such as Python, R, and SQL.
91
+ 4. Develop strong analytical and problem-solving skills.
92
+ 5. Build a portfolio of projects to showcase your work.
93
+ 6. Network with other data scientists to learn
94
+ ```
95
+
96
+ NB-Alpaca:
97
+ ```
98
+ Instruction: Hvordan kan jeg starte en karriere som dataforsker? Skriv i listeform.
99
+ Response:
100
+ 1. Søk etter en universitetsgrad i datavitenskap, matematikk, informatikk eller en relatert fag.
101
+ 2. Søk etter en jobb som dataforsker eller dataanalytiker.
102
+ 3. Utvikle dine ferdigheter i datavitenskap, matematikk, informatikk eller en relatert fag.
103
+ 4. Utvikle dine ferdigheter innen dataanalyse.
104
+ ```
105
+
106
+ You can test it using the eval notebook [here](https://colab.research.google.com/github/22-hours/cabrita/blob/main/notebooks/cabrita-lora.ipynb).
107
+
108
+ ## References
109
+
110
+ - [LLaMA](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
111
+ - [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
112
+ - [Norwegian Alpaca](https://huggingface.co/datasets/NbAiLab/norwegian-alpaca)
113
+ - [Alpaca LoRA](https://github.com/tloen/alpaca-lora)
114
+ - [ChatGPT](https://openai.com/blog/chatgpt)
115
+ - [Hugging Face](https://huggingface.co/)
116
+
117
+ ## Hardware Requirements
118
+
119
+ For training we have used an A6000 48GB VRAM Nvidia GPU. For eval, you can use a T4.