|
--- |
|
base_model: EpistemeAI/Fireball-Llama-3.1-8B-Instruct-v1-16bit |
|
language: |
|
- en |
|
license: apache-2.0 |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- llama |
|
- trl |
|
- dpo |
|
--- |
|
|
|
# Uploaded model |
|
|
|
- **Developed by:** EpistemeAI |
|
- **License:** apache-2.0 |
|
- **Finetuned from model :** EpistemeAI/Fireball-Llama-3.1-8B-Instruct-v1-16bit |
|
|
|
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. |
|
|
|
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |
|
|
|
<img src="https://huggingface.co/EpistemeAI/Fireball-Llama-3.1-8B-v1dpo/resolve/main/fireball-llama.JPG" width="200"/> |
|
|
|
# Fireball-Llama-3.1-V1-Instruct # |
|
|
|
## How to use |
|
This repository contains Fireball-Llama-3.11-V1-Instruct , for use with transformers and with the original llama codebase. |
|
|
|
### Use with transformers |
|
|
|
Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function. |
|
|
|
Make sure to update your transformers installation via pip install --upgrade transformers. |
|
Example: |
|
````py |
|
!pip install -U transformers trl peft accelerate bitsandbytes |
|
```` |
|
|
|
````py |
|
import torch |
|
from transformers import ( |
|
AutoModelForCausalLM, |
|
AutoTokenizer, |
|
) |
|
|
|
base_model = "EpistemeAI/Fireball-Llama-3.1-8B-Instruct-v1dpo" |
|
model = AutoModelForCausalLM.from_pretrained( |
|
base_model, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto", |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained(base_model) |
|
|
|
sys = "You are help assistant " \ |
|
"(Advanced Natural-based interaction for the language)." |
|
|
|
messages = [ |
|
{"role": "system", "content": sys}, |
|
{"role": "user", "content": "What is DPO and ORPO fine tune?"}, |
|
] |
|
|
|
#Method 1 |
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False) |
|
for k,v in inputs.items(): |
|
inputs[k] = v.cuda() |
|
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.9, temperature=0.6) |
|
results = tokenizer.batch_decode(outputs)[0] |
|
print(results) |
|
|
|
#Method 2 |
|
import transformers |
|
pipe = transformers.pipeline( |
|
model=model, |
|
tokenizer=tokenizer, |
|
return_full_text=False, # langchain expects the full text |
|
task='text-generation', |
|
max_new_tokens=512, # max number of tokens to generate in the output |
|
temperature=0.6, #temperature for more or less creative answers |
|
do_sample=True, |
|
top_p=0.9, |
|
) |
|
|
|
sequences = pipe(messages) |
|
for seq in sequences: |
|
print(f"{seq['generated_text']}") |
|
```` |
|
|