fblgit's picture
Create README.md
ead4b4a verified
|
raw
history blame
14 kB
metadata
license: apache-2.0
datasets:
  - fblgit/simple-math
  - jondurbin/bagel-v0.3
library_name: transformers
tags:
  - math
  - UNA
  - juanako

UNA-34BeagleSimpleMath-32K-v1

This is a fine-tuned version of fblgit/UNA-34Beagles-32K-v1 using fblgit/simple-math dataset. Powered by The Bagel v0.3 and Yi-34B

Trained on AXOLOTL!

34BEAGLES MATH EVALS

|Tasks|Version|  Filter  |n-shot|  Metric   |Value |   |Stderr|
|-----|-------|----------|-----:|-----------|-----:|---|-----:|
|gsm8k|Yaml   |get-answer|     5|exact_match|0.6505|±  |0.0131|

|    Tasks     |Version|Filter|n-shot| Metric |Value |   |Stderr|
|--------------|-------|------|-----:|--------|-----:|---|-----:|
|arc_challenge |Yaml   |none  |    25|acc     |0.7090|±  |0.0133|
|              |       |none  |    25|acc_norm|0.7329|±  |0.0129|
|truthfulqa_mc2|Yaml   |none  |     0|acc     |0.7378|±  |0.0141|

|                 Tasks                 |Version|Filter|n-shot|Metric|Value |   |Stderr|
|---------------------------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu                                   |N/A    |none  |     0|acc   |0.7524|±  |0.1045|
| - humanities                          |N/A    |none  |     0|acc   |0.7307|±  |0.0846|
|  - formal_logic                       |Yaml   |none  |     0|acc   |0.5873|±  |0.0440|
|  - high_school_european_history       |Yaml   |none  |     0|acc   |0.8667|±  |0.0265|
|  - high_school_us_history             |Yaml   |none  |     0|acc   |0.9167|±  |0.0194|
|  - high_school_world_history          |Yaml   |none  |     0|acc   |0.9114|±  |0.0185|
|  - international_law                  |Yaml   |none  |     0|acc   |0.8430|±  |0.0332|
|  - jurisprudence                      |Yaml   |none  |     0|acc   |0.8704|±  |0.0325|
|  - logical_fallacies                  |Yaml   |none  |     0|acc   |0.8589|±  |0.0274|
|  - moral_disputes                     |Yaml   |none  |     0|acc   |0.7717|±  |0.0226|
|  - moral_scenarios                    |Yaml   |none  |     0|acc   |0.7374|±  |0.0147|
|  - philosophy                         |Yaml   |none  |     0|acc   |0.8006|±  |0.0227|
|  - prehistory                         |Yaml   |none  |     0|acc   |0.8549|±  |0.0196|
|  - professional_law                   |Yaml   |none  |     0|acc   |0.5724|±  |0.0126|
|  - world_religions                    |Yaml   |none  |     0|acc   |0.8830|±  |0.0246|
| - other                               |N/A    |none  |     0|acc   |0.7937|±  |0.1029|
|  - business_ethics                    |Yaml   |none  |     0|acc   |0.7800|±  |0.0416|
|  - clinical_knowledge                 |Yaml   |none  |     0|acc   |0.8000|±  |0.0246|
|  - college_medicine                   |Yaml   |none  |     0|acc   |0.6936|±  |0.0351|
|  - global_facts                       |Yaml   |none  |     0|acc   |0.5500|±  |0.0500|
|  - human_aging                        |Yaml   |none  |     0|acc   |0.7534|±  |0.0289|
|  - management                         |Yaml   |none  |     0|acc   |0.8447|±  |0.0359|
|  - marketing                          |Yaml   |none  |     0|acc   |0.9316|±  |0.0165|
|  - medical_genetics                   |Yaml   |none  |     0|acc   |0.8700|±  |0.0338|
|  - miscellaneous                      |Yaml   |none  |     0|acc   |0.8953|±  |0.0109|
|  - nutrition                          |Yaml   |none  |     0|acc   |0.8170|±  |0.0221|
|  - professional_accounting            |Yaml   |none  |     0|acc   |0.6277|±  |0.0288|
|  - professional_medicine              |Yaml   |none  |     0|acc   |0.8015|±  |0.0242|
|  - virology                           |Yaml   |none  |     0|acc   |0.5723|±  |0.0385|
| - social_sciences                     |N/A    |none  |     0|acc   |0.8274|±  |0.0667|
|  - econometrics                       |Yaml   |none  |     0|acc   |0.6140|±  |0.0458|
|  - high_school_geography              |Yaml   |none  |     0|acc   |0.8889|±  |0.0224|
|  - high_school_government_and_politics|Yaml   |none  |     0|acc   |0.9482|±  |0.0160|
|  - high_school_macroeconomics         |Yaml   |none  |     0|acc   |0.7897|±  |0.0207|
|  - high_school_microeconomics         |Yaml   |none  |     0|acc   |0.8697|±  |0.0219|
|  - high_school_psychology             |Yaml   |none  |     0|acc   |0.8899|±  |0.0134|
|  - human_sexuality                    |Yaml   |none  |     0|acc   |0.8550|±  |0.0309|
|  - professional_psychology            |Yaml   |none  |     0|acc   |0.7745|±  |0.0169|
|  - public_relations                   |Yaml   |none  |     0|acc   |0.7000|±  |0.0439|
|  - security_studies                   |Yaml   |none  |     0|acc   |0.7796|±  |0.0265|
|  - sociology                          |Yaml   |none  |     0|acc   |0.8657|±  |0.0241|
|  - us_foreign_policy                  |Yaml   |none  |     0|acc   |0.8900|±  |0.0314|
| - stem                                |N/A    |none  |     0|acc   |0.6708|±  |0.1236|
|  - abstract_algebra                   |Yaml   |none  |     0|acc   |0.4900|±  |0.0502|
|  - anatomy                            |Yaml   |none  |     0|acc   |0.7259|±  |0.0385|
|  - astronomy                          |Yaml   |none  |     0|acc   |0.8487|±  |0.0292|
|  - college_biology                    |Yaml   |none  |     0|acc   |0.8750|±  |0.0277|
|  - college_chemistry                  |Yaml   |none  |     0|acc   |0.5200|±  |0.0502|
|  - college_computer_science           |Yaml   |none  |     0|acc   |0.6200|±  |0.0488|
|  - college_mathematics                |Yaml   |none  |     0|acc   |0.4300|±  |0.0498|
|  - college_physics                    |Yaml   |none  |     0|acc   |0.5686|±  |0.0493|
|  - computer_security                  |Yaml   |none  |     0|acc   |0.7800|±  |0.0416|
|  - conceptual_physics                 |Yaml   |none  |     0|acc   |0.7404|±  |0.0287|
|  - electrical_engineering             |Yaml   |none  |     0|acc   |0.7172|±  |0.0375|
|  - elementary_mathematics             |Yaml   |none  |     0|acc   |0.6720|±  |0.0242|
|  - high_school_biology                |Yaml   |none  |     0|acc   |0.9032|±  |0.0168|
|  - high_school_chemistry              |Yaml   |none  |     0|acc   |0.6256|±  |0.0341|
|  - high_school_computer_science       |Yaml   |none  |     0|acc   |0.7800|±  |0.0416|
|  - high_school_mathematics            |Yaml   |none  |     0|acc   |0.4667|±  |0.0304|
|  - high_school_physics                |Yaml   |none  |     0|acc   |0.5033|±  |0.0408|
|  - high_school_statistics             |Yaml   |none  |     0|acc   |0.6435|±  |0.0327|
|  - machine_learning                   |Yaml   |none  |     0|acc   |0.5536|±  |0.0472|

|      Groups      |Version|Filter|n-shot|Metric|Value |   |Stderr|
|------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu              |N/A    |none  |     0|acc   |0.7524|±  |0.1045|
| - humanities     |N/A    |none  |     0|acc   |0.7307|±  |0.0846|
| - other          |N/A    |none  |     0|acc   |0.7937|±  |0.1029|
| - social_sciences|N/A    |none  |     0|acc   |0.8274|±  |0.0667|
| - stem           |N/A    |none  |     0|acc   |0.6708|±  |0.1236|

34BEAGLES (The Base Model)

|    Tasks     |Version|Filter|n-shot| Metric |Value |   |Stderr|
|--------------|-------|------|-----:|--------|-----:|---|-----:|
|arc_challenge |Yaml   |none  |    25|acc     |0.7039|±  |0.0133|
|              |       |none  |    25|acc_norm|0.7321|±  |0.0129|
|truthfulqa_mc2|Yaml   |none  |     0|acc     |0.7387|±  |0.0141|

|Tasks|Version|  Filter  |n-shot|  Metric   |Value |   |Stderr|
|-----|-------|----------|-----:|-----------|-----:|---|-----:|
|gsm8k|Yaml   |get-answer|     5|exact_match|0.6399|±  |0.0132|

|                 Tasks                 |Version|Filter|n-shot|Metric|Value |   |Stderr|
|---------------------------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu                                   |N/A    |none  |     0|acc   |0.7477|±  |0.1079|
| - humanities                          |N/A    |none  |     0|acc   |0.7188|±  |0.0855|
|  - formal_logic                       |Yaml   |none  |     0|acc   |0.5794|±  |0.0442|
|  - high_school_european_history       |Yaml   |none  |     0|acc   |0.8667|±  |0.0265|
|  - high_school_us_history             |Yaml   |none  |     0|acc   |0.9069|±  |0.0204|
|  - high_school_world_history          |Yaml   |none  |     0|acc   |0.9072|±  |0.0189|
|  - international_law                  |Yaml   |none  |     0|acc   |0.8264|±  |0.0346|
|  - jurisprudence                      |Yaml   |none  |     0|acc   |0.8796|±  |0.0315|
|  - logical_fallacies                  |Yaml   |none  |     0|acc   |0.8405|±  |0.0288|
|  - moral_disputes                     |Yaml   |none  |     0|acc   |0.7746|±  |0.0225|
|  - moral_scenarios                    |Yaml   |none  |     0|acc   |0.6972|±  |0.0154|
|  - philosophy                         |Yaml   |none  |     0|acc   |0.8006|±  |0.0227|
|  - prehistory                         |Yaml   |none  |     0|acc   |0.8580|±  |0.0194|
|  - professional_law                   |Yaml   |none  |     0|acc   |0.5645|±  |0.0127|
|  - world_religions                    |Yaml   |none  |     0|acc   |0.8713|±  |0.0257|
| - other                               |N/A    |none  |     0|acc   |0.7950|±  |0.1057|
|  - business_ethics                    |Yaml   |none  |     0|acc   |0.7700|±  |0.0423|
|  - clinical_knowledge                 |Yaml   |none  |     0|acc   |0.8038|±  |0.0244|
|  - college_medicine                   |Yaml   |none  |     0|acc   |0.7110|±  |0.0346|
|  - global_facts                       |Yaml   |none  |     0|acc   |0.5500|±  |0.0500|
|  - human_aging                        |Yaml   |none  |     0|acc   |0.7265|±  |0.0299|
|  - management                         |Yaml   |none  |     0|acc   |0.8544|±  |0.0349|
|  - marketing                          |Yaml   |none  |     0|acc   |0.9444|±  |0.0150|
|  - medical_genetics                   |Yaml   |none  |     0|acc   |0.8800|±  |0.0327|
|  - miscellaneous                      |Yaml   |none  |     0|acc   |0.8978|±  |0.0108|
|  - nutrition                          |Yaml   |none  |     0|acc   |0.8170|±  |0.0221|
|  - professional_accounting            |Yaml   |none  |     0|acc   |0.6312|±  |0.0288|
|  - professional_medicine              |Yaml   |none  |     0|acc   |0.8051|±  |0.0241|
|  - virology                           |Yaml   |none  |     0|acc   |0.5602|±  |0.0386|
| - social_sciences                     |N/A    |none  |     0|acc   |0.8297|±  |0.0664|
|  - econometrics                       |Yaml   |none  |     0|acc   |0.6140|±  |0.0458|
|  - high_school_geography              |Yaml   |none  |     0|acc   |0.8939|±  |0.0219|
|  - high_school_government_and_politics|Yaml   |none  |     0|acc   |0.9482|±  |0.0160|
|  - high_school_macroeconomics         |Yaml   |none  |     0|acc   |0.7974|±  |0.0204|
|  - high_school_microeconomics         |Yaml   |none  |     0|acc   |0.8655|±  |0.0222|
|  - high_school_psychology             |Yaml   |none  |     0|acc   |0.8936|±  |0.0132|
|  - human_sexuality                    |Yaml   |none  |     0|acc   |0.8473|±  |0.0315|
|  - professional_psychology            |Yaml   |none  |     0|acc   |0.7778|±  |0.0168|
|  - public_relations                   |Yaml   |none  |     0|acc   |0.7000|±  |0.0439|
|  - security_studies                   |Yaml   |none  |     0|acc   |0.7837|±  |0.0264|
|  - sociology                          |Yaml   |none  |     0|acc   |0.8657|±  |0.0241|
|  - us_foreign_policy                  |Yaml   |none  |     0|acc   |0.8900|±  |0.0314|
| - stem                                |N/A    |none  |     0|acc   |0.6641|±  |0.1291|
|  - abstract_algebra                   |Yaml   |none  |     0|acc   |0.4800|±  |0.0502|
|  - anatomy                            |Yaml   |none  |     0|acc   |0.7407|±  |0.0379|
|  - astronomy                          |Yaml   |none  |     0|acc   |0.8618|±  |0.0281|
|  - college_biology                    |Yaml   |none  |     0|acc   |0.8611|±  |0.0289|
|  - college_chemistry                  |Yaml   |none  |     0|acc   |0.5300|±  |0.0502|
|  - college_computer_science           |Yaml   |none  |     0|acc   |0.6100|±  |0.0490|
|  - college_mathematics                |Yaml   |none  |     0|acc   |0.3800|±  |0.0488|
|  - college_physics                    |Yaml   |none  |     0|acc   |0.5588|±  |0.0494|
|  - computer_security                  |Yaml   |none  |     0|acc   |0.8000|±  |0.0402|
|  - conceptual_physics                 |Yaml   |none  |     0|acc   |0.7319|±  |0.0290|
|  - electrical_engineering             |Yaml   |none  |     0|acc   |0.7034|±  |0.0381|
|  - elementary_mathematics             |Yaml   |none  |     0|acc   |0.6587|±  |0.0244|
|  - high_school_biology                |Yaml   |none  |     0|acc   |0.8935|±  |0.0175|
|  - high_school_chemistry              |Yaml   |none  |     0|acc   |0.6305|±  |0.0340|
|  - high_school_computer_science       |Yaml   |none  |     0|acc   |0.7700|±  |0.0423|
|  - high_school_mathematics            |Yaml   |none  |     0|acc   |0.4296|±  |0.0302|
|  - high_school_physics                |Yaml   |none  |     0|acc   |0.5166|±  |0.0408|
|  - high_school_statistics             |Yaml   |none  |     0|acc   |0.6528|±  |0.0325|
|  - machine_learning                   |Yaml   |none  |     0|acc   |0.5536|±  |0.0472|

|      Groups      |Version|Filter|n-shot|Metric|Value |   |Stderr|
|------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu              |N/A    |none  |     0|acc   |0.7477|±  |0.1079|
| - humanities     |N/A    |none  |     0|acc   |0.7188|±  |0.0855|
| - other          |N/A    |none  |     0|acc   |0.7950|±  |0.1057|
| - social_sciences|N/A    |none  |     0|acc   |0.8297|±  |0.0664|
| - stem           |N/A    |none  |     0|acc   |0.6641|±  |0.1291|

So I guess, SimpleMath: 2+2=4 4-1=3 SIMPLE-MATH .. works! :)