File size: 1,085 Bytes
3c60daa
612f7d4
 
9b0e729
 
b2cf420
3c60daa
 
 
 
 
9b0e729
 
 
3c60daa
9b0e729
3c60daa
9b0e729
3c60daa
 
 
9b0e729
3c60daa
9b0e729
 
3c60daa
 
ae10c43
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
---
base_model:
- Qwen/Qwen2.5-0.5B-Instruct
language:
- en
library_name: transformers
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->
This is a Qwen2.5 0.5B Instruct model which got fine-tuned on a dataset generated by Monte Carlo Tree Search based sampling.
MCTS was rolled out on a small subset of the GSM8K train split. The resulting traces & value estimates were then used to form the dataset.
Only the last two transformer blocks and the regression head were unfroozen. 

The idea is to use only the value network to do MCTS sampling, without the need of simulating/rolling out. 

Currently the value network is overfitting, due to very limited samples. Going to update this soon, when I've sampled more data. 



### Scores on the first 65 samples of the gsm8k test-split:

- Beam-search (3 beams): 40.0%
- MCTS-search (3 beams): 50.77%


The final rollout of the MCTS-search is done also via Beam-serach. During testing on gsm8k, only the value network was used to guide the search.



All tests were done with Qwen2.5 0.5B Instruct.