|
--- |
|
library_name: transformers |
|
base_model: |
|
- Qwen/Qwen2.5-0.5B-Instruct |
|
language: |
|
- en |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
This is a Qwen2.5 0.5B Instruct model which got fine-tuned on a dataset generated by Monte Carlo Tree Search based sampling. |
|
MCTS was rolled out on a small subset of the GSM8K train split. The resulting traces & value estimates were then used to form the dataset. |
|
Only the last two transformer blocks and the regression head were unfroozen. |
|
|
|
The idea is to use only the value network to do MCTS sampling, without the need of simulating/rolling out. |
|
|
|
Currently the value network is overfitting, due to very limited samples. Going to update this soon, when I've sampled more data. |
|
|
|
|
|
|
|
### Scores on the first 65 samples of the gsm8k test-split: |
|
|
|
- Beam-search (3 beams): 40.0% |
|
- MCTS-search (3 beams): 50.77% |
|
|
|
|
|
The final rollout of the MCTS-search is done also via Beam-serach. During testing on gsm8k, only the value network was used to guide the search. |