README.md · Daemontatox/PathfinderAI at main

PathfinderAI / README.md

Daemontatox

Adding Evaluation Results (#1)

9b280a4 verified 9 days ago

preview code

raw

history blame contribute delete

7.96 kB

	---
	base_model:
	- Qwen/QwQ-32B-Preview
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- trl
	- COT
	- Reasoning
	- Smart
	- Qwen
	- QwQ
	license: apache-2.0
	language:
	- en
	datasets:
	- Daemontatox/LongCOT-Reason
	metrics:
	- accuracy
	- character
	library_name: transformers
	pipeline_tag: text-generation
	model-index:
	- name: PathfinderAI
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: HuggingFaceH4/ifeval
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 37.45
	name: strict accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Daemontatox/PathfinderAI
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: BBH
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 52.65
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Daemontatox/PathfinderAI
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: hendrycks/competition_math
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 47.58
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Daemontatox/PathfinderAI
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 19.24
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Daemontatox/PathfinderAI
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 20.83
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Daemontatox/PathfinderAI
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 51.04
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Daemontatox/PathfinderAI
	name: Open LLM Leaderboard
	---

	![image](./image.webp)

	# PathfinderAI

	- Developed by: Daemontatox
	- License: Apache 2.0
	- Finetuned Using: [Unsloth](https://github.com/unslothai/unsloth), Hugging Face Transformers, and TRL Library

	## Model Overview

	The PathfinderAI Model is an advanced AI system optimized for logical reasoning, multi-step problem-solving, and decision-making tasks. Designed with efficiency and accuracy in mind, it employs a structured system prompt to ensure high-quality answers through a transparent and iterative thought process.

	### System Prompt and Workflow

	This model operates using an innovative reasoning framework structured around the following steps:

	1. Initial Thought:
	The model uses `<Thinking>` tags to reason step-by-step and craft its best possible response.
	Example:

	2. Self-Critique:
	It evaluates its initial response within `<Critique>` tags, focusing on:
	- Accuracy: Is it factually correct and verifiable?
	- Clarity: Is it clear and free of ambiguity?
	- Completeness: Does it fully address the request?
	- Improvement: What can be enhanced?
	Example:

	3. Revision:
	Based on the critique, the model refines its response within `<Revising>` tags.
	Example:

	4. Final Response:
	The revised response is presented clearly within `<Final>` tags.
	Example:

	5. Tag Innovation:
	When needed, the model creates and defines new tags for better structuring or clarity, ensuring consistent usage.
	Example:

	### Key Features
	- Structured Reasoning: Transparent, multi-step approach for generating and refining answers.
	- Self-Improvement: Built-in critique and revision ensure continuous response enhancement.
	- Clarity and Adaptability: Tagging system provides organized, adaptable responses tailored to user needs.
	- Creative Flexibility: Supports dynamic problem-solving with the ability to introduce new tags and concepts.

	---

	## Use Cases

	The model is designed for various domains, including:
	1. Research and Analysis: Extracting insights and providing structured explanations.
	2. Education: Assisting with tutoring by breaking down complex problems step-by-step.
	3. Problem-Solving: Offering logical and actionable solutions for multi-step challenges.
	4. Content Generation: Producing clear, well-organized creative or professional content.

	---

	## Training Details

	- Frameworks:
	- [Unsloth](https://github.com/unslothai/unsloth) for accelerated training.
	- Hugging Face Transformers and the TRL library for reinforcement learning with human feedback (RLHF).

	- Dataset: Finetuned on diverse reasoning-focused tasks, including logical puzzles, mathematical problems, and commonsense reasoning scenarios.

	- Hardware Efficiency:
	- Trained with bnb-4bit precision for reduced memory usage.
	- Optimized training pipeline achieving 2x faster development cycles.

	---

	## Limitations

	- Hallucinations Model might hallucinate in very long context problems.
	- Unclosed tags As the model gets deep into thinking and reflecting ,it has a tendency to not close thinking or critique tags .
	- Tags Compression As the model gets confident in the answer , it will use less and less tags and might have everything in the <Thinking> Tag ,instead of reasoning and going step by step.
	- High Resource This Model is Resource intensive and needs a lot of uninterrupted computing , since it's continuously generating tokens to reason , so it might work the best with consumer hardware.
	---

	## Ethical Considerations

	- Transparency: Responses are structured for verifiability through tagging.
	- Bias Mitigation: Includes self-critique to minimize biases and ensure fairness.
	- Safe Deployment: Users are encouraged to evaluate outputs to prevent harm or misinformation.

	---

	## License

	This model is distributed under the Apache 2.0 license, allowing users to use, modify, and share it in compliance with the license terms.

	---

	## Acknowledgments

	Special thanks to:
	- [Unsloth](https://github.com/unslothai/unsloth) for accelerated training workflows.
	- Hugging Face for their powerful tools and libraries.

	---

	Experience the PathfinderAI l, leveraging its structured reasoning and self-improvement capabilities for any task requiring advanced AI reasoning.
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Daemontatox__PathfinderAI-details)!
	Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Daemontatox/PathfinderAI)!

	\| Metric \|% Value\|
	\|-------------------\|------:\|
	\|Avg. \| 38.13\|
	\|IFEval (0-Shot) \| 37.45\|
	\|BBH (3-Shot) \| 52.65\|
	\|MATH Lvl 5 (4-Shot)\| 47.58\|
	\|GPQA (0-shot) \| 19.24\|
	\|MuSR (0-shot) \| 20.83\|
	\|MMLU-PRO (5-shot) \| 51.04\|