metadata

language:
  - en
license: llama3.2
tags:
  - text-generation-inference
  - transformers
  - llama
  - trl
  - sft
  - reasoning
  - llama-3
base_model: lunahr/Hermes-3-Llama-3.2-3B-abliterated
datasets:
  - KingNish/reasoning-base-20k
  - lunahr/thea-name-overrides

Note

This model has been withdrawn from the Thea Series. It is not available.

Model Description

An uncensored reasoning Llama 3.2 3B model trained on reasoning data.

This is the 2nd revision of Thea, based on a better base model, and with twice the reasoning data.

It has been trained using improved training code, and gives an improved performance. Here is what inference code you should use:

from transformers import AutoModelForCausalLM, AutoTokenizer

MAX_REASONING_TOKENS = 1024
MAX_RESPONSE_TOKENS = 512

model_name = "lunahr/thea-v2-3b-50r"

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Which is greater 9.9 or 9.11 ??"
messages = [
    {"role": "user", "content": prompt}
]

# Generate reasoning
reasoning_template = tokenizer.apply_chat_template(messages, tokenize=False, add_reasoning_prompt=True)
reasoning_inputs = tokenizer(reasoning_template, return_tensors="pt").to(model.device)
reasoning_ids = model.generate(**reasoning_inputs, max_new_tokens=MAX_REASONING_TOKENS)
reasoning_output = tokenizer.decode(reasoning_ids[0, reasoning_inputs.input_ids.shape[1]:], skip_special_tokens=True)

print("REASONING: " + reasoning_output)

# Generate answer
messages.append({"role": "reasoning", "content": reasoning_output})
response_template = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response_inputs = tokenizer(response_template, return_tensors="pt").to(model.device)
response_ids = model.generate(**response_inputs, max_new_tokens=MAX_RESPONSE_TOKENS)
response_output = tokenizer.decode(response_ids[0, response_inputs.input_ids.shape[1]:], skip_special_tokens=True)

print("ANSWER: " + response_output)

Trained by: Piotr Zalewski
License: llama3.2
Finetuned from model: lunahr/Hermes-3-Llama-3.2-3B-abliterated*
Dataset used: KingNish/reasoning-base-20k

This Llama model was trained faster than Unsloth using custom training code.

Visit https://www.kaggle.com/code/piotr25691/distributed-llama-training-with-2xt4 to find out how you can finetune your models using BOTH of the Kaggle provided GPUs.

*Created from https://huggingface.co/NousResearch/Hermes-3-Llama-3.2-3B using a custom abliterator.