Liangmingxin
/

ThetaWave-7B-sft

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ThetaWave-7B-sft / README.md

Liangmingxin's picture

Update README.md

54957fa verified 10 months ago

|

history blame contribute delete

1.89 kB

	---
	license: apache-2.0
	datasets:
	- Open-Orca/SlimOrca
	pipeline_tag: text-generation
	---

	Obtained from freecs/ThetaWave-7B after SFT fine tuning.

	Open-Orca/SlimOrca datasets were used.

	The model does not currently support system_prompt because it uses mistral's chat_template, and the next release is in training to switch to the chatml template to support system_prompt. system_prompt can be implemented if you manually change the chat_template, but the After testing, this seems to degrade the model performance.

	More model details will be released...

	Vllm deployment command
	```
	# Single graphics card
	python /path/to/vllm/vllm/entrypoints/openai/api_server.py \
	--model '/path/to/ThetaWave-7B-sft' \
	--tokenizer '/path/to/ThetaWave-7B-sft' \
	--tokenizer-mode auto \
	--dtype float16 \
	--enforce-eager \
	--host 0.0.0.0 \
	--port 6000 \
	--disable-log-stats \
	--disable-log-requests

	# Dual graphics cards
	python /path/to/vllm/vllm/entrypoints/openai/api_server.py \
	--model '/path/to/ThetaWave-7B-sft' \
	--tokenizer '/path/to/ThetaWave-7B-sft' \
	--tokenizer-mode auto \
	--dtype float16 \
	--enforce-eager \
	--tensor-parallel-size 2 \
	--worker-use-ray \
	--engine-use-ray \
	--host 0.0.0.0 \
	--port 6000 \
	--disable-log-stats \
	--disable-log-requests
	```

	Try it directly:
	```
	from transformers import AutoModelForCausalLM, AutoTokenizer

	device = "cuda" # the device to load the model onto

	model = AutoModelForCausalLM.from_pretrained("Liangmingxin/ThetaWave-7B-sft")
	tokenizer = AutoTokenizer.from_pretrained("Liangmingxin/ThetaWave-7B-sft")

	messages = [
	{"role": "user", "content": "Who are you?"},
	]

	encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

	model_inputs = encodeds.to(device)
	model.to(device)

	generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
	decoded = tokenizer.batch_decode(generated_ids)
	print(decoded[0])
	```