|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
## Introduction |
|
|
|
This is the fastText classifier used for the finer filtering of CC-En in [MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code](https://arxiv.org/abs/2410.08196). |
|
|
|
## Usage |
|
|
|
```python |
|
import fasttext |
|
|
|
model = fasttext.load_model("fastText-cc-en-filter_round2.bin") |
|
thresh = 0.5 |
|
|
|
text = "The text to be predicted." |
|
predictions = model.predict([text,])[0] |
|
|
|
label = predictions[0][0] |
|
if label == "__label__related": |
|
print("math") |
|
else: |
|
print("other") |
|
``` |
|
|
|
## Citation |
|
|
|
If you find this repository helpful, please consider citing our papers: |
|
|
|
``` |
|
@misc{lu2024mathcoder2bettermathreasoning, |
|
title={MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code}, |
|
author={Zimu Lu and Aojun Zhou and Ke Wang and Houxing Ren and Weikang Shi and Junting Pan and Mingjie Zhan and Hongsheng Li}, |
|
year={2024}, |
|
eprint={2410.08196}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2410.08196}, |
|
} |
|
``` |
|
``` |
|
@inproceedings{ |
|
wang2024mathcoder, |
|
title={MathCoder: Seamless Code Integration in {LLM}s for Enhanced Mathematical Reasoning}, |
|
author={Zimu Lu and Aojun Zhou and Zimu Lu and Sichun Luo and Weikang Shi and Renrui Zhang and Linqi Song and Mingjie Zhan and Hongsheng Li}, |
|
booktitle={The Twelfth International Conference on Learning Representations}, |
|
year={2024}, |
|
url={https://openreview.net/forum?id=z8TW0ttBPp} |
|
} |
|
``` |
|
|