|
--- |
|
library_name: transformers |
|
license: mit |
|
base_model: intfloat/multilingual-e5-small |
|
tags: |
|
- generated_from_trainer |
|
metrics: |
|
- precision |
|
- recall |
|
- accuracy |
|
model-index: |
|
- name: owm-math-scorer-multilingual-e5-small |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# FineMath classifier |
|
|
|
## Model summary |
|
This is a classifier for evaluating mathematical reasoning and deduction in web pages, fine-tuned from [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small). It was developed to filter and curate mathematical content from web datasets and was trained on 1M annotations generated by [LLama3-70B-instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) for web samples from Common Crawl, which were extracted using the [OpenWebMath](https://github.com/keirp/OpenWebMath) text extraction pipeline. To ensure a balanced dataset, we upsampled pages containing mathematical content in the annotations, using a preliminary math classifier on 5M samples. |
|
|
|
We used this classifier to build [FineMath](https://huggingface.co/datasets/HuggingFaceTB/finemath) dataset. |
|
|
|
### How to use in transformers |
|
To load the FineMath classifier, use the following code: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/finemath-classifier") |
|
model = AutoModelForSequenceClassification.from_pretrained("HuggingFaceTB/finemath-classifier") |
|
|
|
text = "This is a test sentence." |
|
inputs = tokenizer(text, return_tensors="pt", padding="longest", truncation=True) |
|
outputs = model(**inputs) |
|
logits = outputs.logits.squeeze(-1).float().detach().numpy() |
|
score = logits.item() |
|
result = { |
|
"text": text, |
|
"score": score, |
|
"int_score": int(round(max(0, min(score, 5)))), |
|
} |
|
|
|
print(result) |
|
# {'text': 'This is a test sentence.', 'score': 0.07964489609003067, 'int_score': 0} |
|
``` |
|
|
|
|
|
## Training |
|
The classifier was trained on 1M pairs of web samples and their scores from 0 to 5, generated by Llama3. The samples were annotated based on their usefulness for studying mathematics with 0 being not educational or containing matematical content and 5 being outstanding for mathetmatics education. |
|
|
|
Below is the prompt used for LLama3 annotations: |
|
<div style="text-align: center; margin: 20px 0;"> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/PXtxtC-h7XPFJhx4DJjCF.png" alt="Prompt for LLM annotation" style="width: 90%; max-width: 800px; height: auto;"> |
|
</div> |
|
|
|
|
|
We added a classification head with a single regression output to [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) and trained the model for 20 epochs with a learning rate of 3e-4. During training, the embedding and encoder layers were frozen to focus on the classification head. The model achieved an F1 score of 87% when converted to a binary classifier using a score threshold of 3. |
|
|
|
**Training Details:** |
|
|
|
- Model: intfloat/multilingual-e5-smallwith a classification head |
|
- Dataset: 1M samples from Llama3 annotations |
|
- Epochs: 20 |
|
- Learning Rate: 3e-4 |
|
- Evaluation Metric: F1 score |
|
|
|
**Evaluation:** |
|
The model achieves the following results on the evaluation set: |
|
- Loss: 0.4478 |
|
- Precision: 0.8771 |
|
- Recall: 0.8769 |
|
- F1 Macro: 0.8770 |
|
- Accuracy: 0.8770 |
|
|
|
|