|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- Open-Orca/OpenOrca |
|
- OpenAssistant/oasst_top1_2023-08-25 |
|
language: |
|
- bg |
|
- ca |
|
- cs |
|
- da |
|
- de |
|
- en |
|
- es |
|
- fr |
|
- hr |
|
- hu |
|
- it |
|
- nl |
|
- pl |
|
- pt |
|
- ro |
|
- ru |
|
- sl |
|
- sr |
|
- sv |
|
- uk |
|
library_name: transformers |
|
--- |
|
|
|
``` |
|
reference-data-model: |
|
|
|
datasets: |
|
- OpenAssistant/oasst_top1_2023-08-25: |
|
lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk" |
|
link: https://huggingface.co/datasets/OpenAssistant/oasst_top1_2023-08-25 |
|
|
|
model: |
|
- Open-Orca/Mistral-7B-OpenOrca |
|
Link: |
|
https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca |
|
|
|
100 examples of generating: |
|
- Link: |
|
https://huggingface.co/NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v3/blob/main/output.xlsx |
|
|
|
Activated training with: |
|
- Link: |
|
https://huggingface.co/blog/tomaarsen/attention-sinks |
|
https://github.com/tomaarsen/attention_sinks |
|
https://arxiv.org/abs/2309.17453 |
|
|
|
TRL: |
|
- Link: |
|
https://huggingface.co/docs/trl/index |
|
https://huggingface.co/docs/trl/sft_trainer |
|
|
|
flash-attention: |
|
- Link: |
|
https://github.com/Dao-AILab/flash-attention |
|
https://arxiv.org/abs/2205.14135 |
|
|
|
Version: |
|
- Link: |
|
https://huggingface.co/NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v1 |
|
https://huggingface.co/NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2 |
|
https://huggingface.co/NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v3 |
|
|
|
|
|
``` |
|
|
|
|
|
## |
|
|
|
|
|
```py |
|
# attention-sinks |
|
pip install attention_sinks |
|
|
|
# flash-attn |
|
!export CUDA_HOME=/usr/local/cuda-11.8 |
|
!MAX_JOBS=4 pip install flash-attn --no-build-isolation -qqq |
|
!pip install git+"https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary" -qqq |
|
``` |
|
|
|
|
|
## Version |
|
```py |
|
import torch, transformers,torchvision |
|
torch.__version__,transformers.__version__, torchvision.__version__ |
|
#OUTPUTS: ('2.0.1+cu118', '4.34.0', '0.15.2+cu118') |
|
``` |
|
|
|
## How to use |
|
```py |
|
|
|
from transformers import ( |
|
AutoModelForCausalLM, |
|
AutoTokenizer, |
|
BitsAndBytesConfig, |
|
HfArgumentParser, |
|
TrainingArguments, |
|
pipeline, |
|
logging, |
|
GenerationConfig, |
|
TextIteratorStreamer, |
|
) |
|
|
|
from attention_sinks import AutoModelForCausalLM |
|
|
|
import torch |
|
|
|
# model_id = 'Open-Orca/Mistral-7B-OpenOrca' |
|
model_id='NickyNicky/Mixtral-2x7b-OpenOrca-oasst_top1_2023-08-25-v1.0' |
|
|
|
model = AutoModelForCausalLM.from_pretrained(model_id, |
|
device_map="auto", |
|
trust_remote_code=True, |
|
torch_dtype=torch.bfloat16, |
|
load_in_4bit=True, |
|
low_cpu_mem_usage= True, |
|
#use_flash_attention_2=True, #GPU A100 or GPU supported |
|
|
|
attention_sink_size=4, |
|
attention_sink_window_size=1024, #512, # <- Low for the sake of faster generation |
|
) |
|
|
|
max_length=2048 |
|
print("max_length",max_length) |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id, |
|
# use_fast = False, |
|
max_length=max_length,) |
|
|
|
tokenizer.pad_token = tokenizer.eos_token |
|
tokenizer.padding_side = 'right' |
|
|
|
#EXAMPLE #1 |
|
txt="""<|im_start|>user |
|
I'm looking for an efficient Python script to output prime numbers. Can you help me out? I'm interested in a script that can handle large numbers and output them quickly. Also, it would be great if the script could take a range of numbers as input and output all the prime numbers within that range. Can you generate a script that fits these requirements? Thanks!<|im_end|> |
|
<|im_start|>assistant |
|
""" |
|
|
|
#EXAMPLE #2 |
|
txt="""<|im_start|>user |
|
Estoy desarrollando una REST API con Nodejs, y estoy tratando de aplicar algún sistema de seguridad, ya sea con tokens o algo similar, me puedes ayudar?<|im_end|> |
|
<|im_start|>assistant |
|
""" |
|
|
|
inputs = tokenizer.encode(txt, return_tensors="pt").to("cuda") |
|
|
|
generation_config = GenerationConfig( |
|
max_new_tokens=max_new_tokens, |
|
temperature=0.7, |
|
top_p=0.9, |
|
top_k=len_tokens, |
|
repetition_penalty=1.11, |
|
do_sample=True, |
|
# pad_token_id=tokenizer.eos_token_id, |
|
# eos_token_id=tokenizer.eos_token_id, |
|
# use_cache=True, |
|
# stopping_criteria= StoppingCriteriaList([stopping_criteria]), |
|
) |
|
outputs = model.generate(generation_config=generation_config, |
|
input_ids=inputs,) |
|
tokenizer.decode(outputs[0], skip_special_tokens=False) #True |
|
``` |
|
|
|
#MIX-MOE-mergekit |
|
``` |
|
experts: |
|
- source_model: NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2 |
|
positive_prompts: |
|
- "" |
|
|
|
- source_model: NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v3 |
|
positive_prompts: |
|
- "" |
|
|
|
base_model: NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v3 |
|
gate_mode: random # one of "hidden", "cheap_embed", or "random" |
|
dtype: bfloat16 # output dtype (float32, float16, or bfloat16) |
|
``` |
|
|