Note
This model has been withdrawn from the Thea Series. It is not available.
Model Description
An uncensored reasoning Llama 3.2 3B model trained on reasoning data.
This is the 2nd revision of Thea, based on a better base model, and with twice the reasoning data.
It has been trained using improved training code, and gives an improved performance. Here is what inference code you should use:
from transformers import AutoModelForCausalLM, AutoTokenizer
MAX_REASONING_TOKENS = 1024
MAX_RESPONSE_TOKENS = 512
model_name = "lunahr/thea-v2-3b-50r"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Which is greater 9.9 or 9.11 ??"
messages = [
{"role": "user", "content": prompt}
]
# Generate reasoning
reasoning_template = tokenizer.apply_chat_template(messages, tokenize=False, add_reasoning_prompt=True)
reasoning_inputs = tokenizer(reasoning_template, return_tensors="pt").to(model.device)
reasoning_ids = model.generate(**reasoning_inputs, max_new_tokens=MAX_REASONING_TOKENS)
reasoning_output = tokenizer.decode(reasoning_ids[0, reasoning_inputs.input_ids.shape[1]:], skip_special_tokens=True)
print("REASONING: " + reasoning_output)
# Generate answer
messages.append({"role": "reasoning", "content": reasoning_output})
response_template = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response_inputs = tokenizer(response_template, return_tensors="pt").to(model.device)
response_ids = model.generate(**response_inputs, max_new_tokens=MAX_RESPONSE_TOKENS)
response_output = tokenizer.decode(response_ids[0, response_inputs.input_ids.shape[1]:], skip_special_tokens=True)
print("ANSWER: " + response_output)
- Trained by: Piotr Zalewski
- License: llama3.2
- Finetuned from model: lunahr/Hermes-3-Llama-3.2-3B-abliterated*
- Dataset used: KingNish/reasoning-base-20k
This Llama model was trained faster than Unsloth using custom training code.
Visit https://www.kaggle.com/code/piotr25691/distributed-llama-training-with-2xt4 to find out how you can finetune your models using BOTH of the Kaggle provided GPUs.
*Created from https://huggingface.co/NousResearch/Hermes-3-Llama-3.2-3B using a custom abliterator.
- Downloads last month
- 50
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.