legolasyiu commited on
Commit
407bac8
1 Parent(s): 5ec2510

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -1
README.md CHANGED
@@ -24,7 +24,70 @@ This llama model was trained 2x faster with [Unsloth](https://github.com/unsloth
24
  Fireball-Llama-3.1-V1
25
 
26
  <img src="https://huggingface.co/EpistemeAI/Fireball-Llama-3.1-8B-v1dpo/resolve/main/fireball-llama.JPG" width="200"/>
27
- ## For transfomer:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
 
30
 
 
24
  Fireball-Llama-3.1-V1
25
 
26
  <img src="https://huggingface.co/EpistemeAI/Fireball-Llama-3.1-8B-v1dpo/resolve/main/fireball-llama.JPG" width="200"/>
27
+ # Fireball-Llama-3.11-V1
28
+
29
+ ## How to use
30
+ This repository contains Fireball-Llama-3.11-V1 , for use with transformers and with the original llama codebase.
31
+
32
+ ### Use with transformers
33
+
34
+ Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.
35
+
36
+ Make sure to update your transformers installation via pip install --upgrade transformers.
37
+ Example:
38
+ ````py
39
+ !pip install -U transformers trl peft accelerate bitsandbytes
40
+ ````
41
+
42
+ ````py
43
+ import torch
44
+ from transformers import (
45
+ AutoModelForCausalLM,
46
+ AutoTokenizer,
47
+ )
48
+
49
+ base_model = "EpistemeAI/Fireball-Llama-3.1-8B-v1dpo"
50
+ model = AutoModelForCausalLM.from_pretrained(
51
+ base_model,
52
+ torch_dtype=torch.bfloat16,
53
+ device_map="auto",
54
+ )
55
+ tokenizer = AutoTokenizer.from_pretrained(base_model)
56
+
57
+ sys = "You are help assistant " \
58
+ "(Advanced Natural-based interaction for the language)."
59
+
60
+ messages = [
61
+ {"role": "system", "content": sys},
62
+ {"role": "user", "content": "What is DPO and ORPO fine tune?"},
63
+ ]
64
+
65
+ #Method 1
66
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
67
+ inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
68
+ for k,v in inputs.items():
69
+ inputs[k] = v.cuda()
70
+ outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.9, temperature=0.6)
71
+ results = tokenizer.batch_decode(outputs)[0]
72
+ print(results)
73
+
74
+ #Method 2
75
+ import transformers
76
+ pipe = transformers.pipeline(
77
+ model=model,
78
+ tokenizer=tokenizer,
79
+ return_full_text=False, # langchain expects the full text
80
+ task='text-generation',
81
+ max_new_tokens=512, # max number of tokens to generate in the output
82
+ temperature=0.6, #temperature for more or less creative answers
83
+ do_sample=True,
84
+ top_p=0.9,
85
+ )
86
+
87
+ sequences = pipe(messages)
88
+ for seq in sequences:
89
+ print(f"{seq['generated_text']}")
90
+ ````
91
 
92
 
93