File size: 1,889 Bytes
a237ee8 063fa54 a237ee8 22e6968 ab3b156 22e6968 ab3b156 54957fa ab3b156 063fa54 ab3b156 063fa54 ab3b156 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
---
license: apache-2.0
datasets:
- Open-Orca/SlimOrca
pipeline_tag: text-generation
---
Obtained from freecs/ThetaWave-7B after SFT fine tuning.
Open-Orca/SlimOrca datasets were used.
The model does not currently support system_prompt because it uses mistral's chat_template, and the next release is in training to switch to the chatml template to support system_prompt. system_prompt can be implemented if you manually change the chat_template, but the After testing, this seems to degrade the model performance.
More model details will be released...
Vllm deployment command
```
# Single graphics card
python /path/to/vllm/vllm/entrypoints/openai/api_server.py \
--model '/path/to/ThetaWave-7B-sft' \
--tokenizer '/path/to/ThetaWave-7B-sft' \
--tokenizer-mode auto \
--dtype float16 \
--enforce-eager \
--host 0.0.0.0 \
--port 6000 \
--disable-log-stats \
--disable-log-requests
# Dual graphics cards
python /path/to/vllm/vllm/entrypoints/openai/api_server.py \
--model '/path/to/ThetaWave-7B-sft' \
--tokenizer '/path/to/ThetaWave-7B-sft' \
--tokenizer-mode auto \
--dtype float16 \
--enforce-eager \
--tensor-parallel-size 2 \
--worker-use-ray \
--engine-use-ray \
--host 0.0.0.0 \
--port 6000 \
--disable-log-stats \
--disable-log-requests
```
Try it directly:
```
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained("Liangmingxin/ThetaWave-7B-sft")
tokenizer = AutoTokenizer.from_pretrained("Liangmingxin/ThetaWave-7B-sft")
messages = [
{"role": "user", "content": "Who are you?"},
]
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = encodeds.to(device)
model.to(device)
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
``` |