|
--- |
|
base_model: |
|
- Qwen/QwQ-32B-Preview |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- trl |
|
- COT |
|
- Reasoning |
|
- Smart |
|
- Qwen |
|
- QwQ |
|
license: apache-2.0 |
|
language: |
|
- en |
|
datasets: |
|
- Daemontatox/LongCOT-Reason |
|
metrics: |
|
- accuracy |
|
- character |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
model-index: |
|
- name: PathfinderAI |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: IFEval (0-Shot) |
|
type: HuggingFaceH4/ifeval |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: inst_level_strict_acc and prompt_level_strict_acc |
|
value: 37.45 |
|
name: strict accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Daemontatox/PathfinderAI |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: BBH (3-Shot) |
|
type: BBH |
|
args: |
|
num_few_shot: 3 |
|
metrics: |
|
- type: acc_norm |
|
value: 52.65 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Daemontatox/PathfinderAI |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MATH Lvl 5 (4-Shot) |
|
type: hendrycks/competition_math |
|
args: |
|
num_few_shot: 4 |
|
metrics: |
|
- type: exact_match |
|
value: 47.58 |
|
name: exact match |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Daemontatox/PathfinderAI |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GPQA (0-shot) |
|
type: Idavidrein/gpqa |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 19.24 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Daemontatox/PathfinderAI |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MuSR (0-shot) |
|
type: TAUR-Lab/MuSR |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 20.83 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Daemontatox/PathfinderAI |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU-PRO (5-shot) |
|
type: TIGER-Lab/MMLU-Pro |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 51.04 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Daemontatox/PathfinderAI |
|
name: Open LLM Leaderboard |
|
--- |
|
|
|
![image](./image.webp) |
|
|
|
# PathfinderAI |
|
|
|
- **Developed by:** Daemontatox |
|
- **License:** Apache 2.0 |
|
- **Finetuned Using:** [Unsloth](https://github.com/unslothai/unsloth), Hugging Face Transformers, and TRL Library |
|
|
|
## Model Overview |
|
|
|
The **PathfinderAI Model** is an advanced AI system optimized for logical reasoning, multi-step problem-solving, and decision-making tasks. Designed with efficiency and accuracy in mind, it employs a structured system prompt to ensure high-quality answers through a transparent and iterative thought process. |
|
|
|
### System Prompt and Workflow |
|
|
|
This model operates using an innovative reasoning framework structured around the following steps: |
|
|
|
1. **Initial Thought:** |
|
The model uses `<Thinking>` tags to reason step-by-step and craft its best possible response. |
|
Example: |
|
|
|
2. **Self-Critique:** |
|
It evaluates its initial response within `<Critique>` tags, focusing on: |
|
- **Accuracy:** Is it factually correct and verifiable? |
|
- **Clarity:** Is it clear and free of ambiguity? |
|
- **Completeness:** Does it fully address the request? |
|
- **Improvement:** What can be enhanced? |
|
Example: |
|
|
|
3. **Revision:** |
|
Based on the critique, the model refines its response within `<Revising>` tags. |
|
Example: |
|
|
|
4. **Final Response:** |
|
The revised response is presented clearly within `<Final>` tags. |
|
Example: |
|
|
|
5. **Tag Innovation:** |
|
When needed, the model creates and defines new tags for better structuring or clarity, ensuring consistent usage. |
|
Example: |
|
|
|
### Key Features |
|
- **Structured Reasoning:** Transparent, multi-step approach for generating and refining answers. |
|
- **Self-Improvement:** Built-in critique and revision ensure continuous response enhancement. |
|
- **Clarity and Adaptability:** Tagging system provides organized, adaptable responses tailored to user needs. |
|
- **Creative Flexibility:** Supports dynamic problem-solving with the ability to introduce new tags and concepts. |
|
|
|
--- |
|
|
|
## Use Cases |
|
|
|
The model is designed for various domains, including: |
|
1. **Research and Analysis:** Extracting insights and providing structured explanations. |
|
2. **Education:** Assisting with tutoring by breaking down complex problems step-by-step. |
|
3. **Problem-Solving:** Offering logical and actionable solutions for multi-step challenges. |
|
4. **Content Generation:** Producing clear, well-organized creative or professional content. |
|
|
|
--- |
|
|
|
## Training Details |
|
|
|
- **Frameworks:** |
|
- [Unsloth](https://github.com/unslothai/unsloth) for accelerated training. |
|
- Hugging Face Transformers and the TRL library for reinforcement learning with human feedback (RLHF). |
|
|
|
- **Dataset:** Finetuned on diverse reasoning-focused tasks, including logical puzzles, mathematical problems, and commonsense reasoning scenarios. |
|
|
|
- **Hardware Efficiency:** |
|
- Trained with bnb-4bit precision for reduced memory usage. |
|
- Optimized training pipeline achieving 2x faster development cycles. |
|
|
|
--- |
|
|
|
## Limitations |
|
|
|
- **Hallucinations** Model might hallucinate in very long context problems. |
|
- **Unclosed tags** As the model gets deep into thinking and reflecting ,it has a tendency to not close thinking or critique tags . |
|
- **Tags Compression** As the model gets confident in the answer , it will use less and less tags and might have everything in the <Thinking> Tag ,instead of reasoning and going step by step. |
|
- **High Resource** This Model is Resource intensive and needs a lot of uninterrupted computing , since it's continuously generating tokens to reason , so it might work the best with consumer hardware. |
|
--- |
|
|
|
## Ethical Considerations |
|
|
|
- **Transparency:** Responses are structured for verifiability through tagging. |
|
- **Bias Mitigation:** Includes self-critique to minimize biases and ensure fairness. |
|
- **Safe Deployment:** Users are encouraged to evaluate outputs to prevent harm or misinformation. |
|
|
|
--- |
|
|
|
## License |
|
|
|
This model is distributed under the Apache 2.0 license, allowing users to use, modify, and share it in compliance with the license terms. |
|
|
|
--- |
|
|
|
## Acknowledgments |
|
|
|
Special thanks to: |
|
- [Unsloth](https://github.com/unslothai/unsloth) for accelerated training workflows. |
|
- Hugging Face for their powerful tools and libraries. |
|
|
|
--- |
|
|
|
Experience the **PathfinderAI l**, leveraging its structured reasoning and self-improvement capabilities for any task requiring advanced AI reasoning. |
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Daemontatox__PathfinderAI-details)! |
|
Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Daemontatox/PathfinderAI)! |
|
|
|
| Metric |% Value| |
|
|-------------------|------:| |
|
|Avg. | 38.13| |
|
|IFEval (0-Shot) | 37.45| |
|
|BBH (3-Shot) | 52.65| |
|
|MATH Lvl 5 (4-Shot)| 47.58| |
|
|GPQA (0-shot) | 19.24| |
|
|MuSR (0-shot) | 20.83| |
|
|MMLU-PRO (5-shot) | 51.04| |
|
|
|
|