micaebe
/

Qwen2.5-0.5B-MCTS-Value-Net

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Qwen2.5-0.5B-MCTS-Value-Net / README.md

micaebe's picture

Update README.md

9b0e729 verified 4 days ago

|

1.03 kB

	---
	library_name: transformers
	base_model:
	- Qwen/Qwen2.5-0.5B-Instruct
	language:
	- en
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->
	This is a Qwen2.5 0.5B Instruct model which got fine-tuned on a dataset generated by Monte Carlo Tree Search based sampling.
	MCTS was rolled out on a small subset of the GSM8K train split. The resulting traces & value estimates were then used to form the dataset.
	Only the last two transformer blocks and the regression head were unfroozen.

	The idea is to use only the value network to do MCTS sampling, without the need of simulating/rolling out.

	Currently the value network is overfitting, due to very limited samples. Going to update this soon, when I've sampled more data.



	### Scores on the first 65 samples of the gsm8k test-split:

	- Beam-search (3 beams): 40.0%
	- MCTS-search (3 beams): 50.77%


	The final rollout of the MCTS-search is done also via Beam-serach. During testing on gsm8k, only the value network was used to guide the search.