Text Generation
Transformers
Safetensors
English
stablelm
conversational
Inference Endpoints
euclaise commited on
Commit
c7cf442
1 Parent(s): 896b291

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -100,7 +100,7 @@ Here are some benchmark results, computed using the the LM Evaluation Harness wi
100
  | Model | GSM8K (strict, 5-shot) | ARC-c (acc_norm, 25-shot) |
101
  |:--------------:|-----------------------:|--------------------------:|
102
  | SFT | 24.34% | 42.92% |
103
- | Masked Thought | 24.18% | **43.60%** |
104
  | **ReMask** | **27.90%** | 43.26% |
105
 
106
  As I expected, it improves GSM8K doesn't do much to ARC.
 
100
  | Model | GSM8K (strict, 5-shot) | ARC-c (acc_norm, 25-shot) |
101
  |:--------------:|-----------------------:|--------------------------:|
102
  | SFT | 24.34% | 42.92% |
103
+ | Masked Thought | 24.18% | *43.60%* |
104
  | **ReMask** | **27.90%** | 43.26% |
105
 
106
  As I expected, it improves GSM8K doesn't do much to ARC.