acrastt commited on
Commit
1006a94
1 Parent(s): fe47943

Update README.md

Browse files

Thanks for adding the README! This was a great model! However, I do see some misinformation/misspelling on the README. So I(Hopefully helpfully) changed it to correct any misinformation/misspelling. This model is not the best 3B nor the best 3B at MMLU(5-shot). Look forward for your reply.

Files changed (1) hide show
  1. README.md +11 -9
README.md CHANGED
@@ -13,9 +13,9 @@ license: apache-2.0
13
  ---
14
  # Model Card
15
 
16
- **The Best 3B Model! Surpassing dolly-v2-12b**
17
 
18
- The best 3B model on MMLU (5-shot) on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), with performance surpassing dolly-v2-12b
19
 
20
  | Metric | Value |
21
  |-----------------------|-------|
@@ -25,15 +25,15 @@ The best 3B model on MMLU (5-shot) on the [Open LLM Leaderboard](https://hugging
25
  | TruthfulQA (0-shot) | 37.3 |
26
  | Avg. | 45.2 |
27
 
28
- We use state-of-the-art [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above.
29
 
30
 
31
- The training code and data will be open sourced later on Github(https://github.com/chi2liu/mamba-gpt-3b)
32
 
33
 
34
  ## Training Dataset
35
 
36
- ` mamba-gpt-3b-v4 ` is trained on multiply dataset:
37
  - [Stanford Alpaca (en)](https://github.com/tatsu-lab/stanford_alpaca)
38
  - [Open Assistant (multilingual)](https://huggingface.co/datasets/OpenAssistant/oasst1)
39
  - [LIMA (en)](https://huggingface.co/datasets/GAIR/lima)
@@ -44,13 +44,13 @@ The training code and data will be open sourced later on Github(https://github.c
44
 
45
  ## Summary
46
 
47
- We have fine-tuned the open-lama model and surpassed the original model in multiple evaluation subtasks, making it currently the best performing 3B model with comparable performance to llama-7b
48
  - Base model: [openlm-research/open_llama_3b_v2](https://huggingface.co/openlm-research/open_llama_3b_v2)
49
 
50
 
51
  ## Usage
52
 
53
- To use the model with the `transformers` library on a machine with GPUs, first make sure you have the `transformers`, `accelerate` and `torch` libraries installed.
54
 
55
  ```bash
56
  pip install transformers==4.29.2
@@ -58,6 +58,8 @@ pip install accelerate==0.19.0
58
  pip install torch==2.0.0
59
  ```
60
 
 
 
61
  ```python
62
  from transformers import AutoTokenizer, AutoModelForCausalLM
63
 
@@ -65,8 +67,8 @@ tokenizer = AutoTokenizer.from_pretrained("CobraMamba/mamba-gpt-3b-v4")
65
  model = AutoModelForCausalLM.from_pretrained("CobraMamba/mamba-gpt-3b-v4", trust_remote_code=True, torch_dtype=torch.float16)
66
 
67
  # we use alpaca prompt
68
- input_context = "Your text here"
69
- input_ids = tokenizer.encode(input_context, return_tensors="pt")
70
  output = model.generate(input_ids, max_length=128, temperature=0.7)
71
  output_text = tokenizer.decode(output[0], skip_special_tokens=True)
72
  print(output_text)
 
13
  ---
14
  # Model Card
15
 
16
+ **One of the Best 3B Model! Surpassing dolly-v2-12b in the Open LLM Leaderboard!**
17
 
18
+ One of the best 3B model on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), with performance surpassing dolly-v2-12b!
19
 
20
  | Metric | Value |
21
  |-----------------------|-------|
 
25
  | TruthfulQA (0-shot) | 37.3 |
26
  | Avg. | 45.2 |
27
 
28
+ We used the SOTA(State Of The Art) [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above.
29
 
30
 
31
+ The training code and data will be open sourced later on Github(https://github.com/chi2liu/mamba-gpt-3b).
32
 
33
 
34
  ## Training Dataset
35
 
36
+ ` mamba-gpt-3b-v4 ` is trained on multiple datasets:
37
  - [Stanford Alpaca (en)](https://github.com/tatsu-lab/stanford_alpaca)
38
  - [Open Assistant (multilingual)](https://huggingface.co/datasets/OpenAssistant/oasst1)
39
  - [LIMA (en)](https://huggingface.co/datasets/GAIR/lima)
 
44
 
45
  ## Summary
46
 
47
+ We have fine-tuned the OpenLLaMA model and surpassed the original model in multiple evaluation subtasks, making it currently one of the best performing 3B model, with comparable performance to llama-7b.
48
  - Base model: [openlm-research/open_llama_3b_v2](https://huggingface.co/openlm-research/open_llama_3b_v2)
49
 
50
 
51
  ## Usage
52
 
53
+ To use the model with the `transformers` library on a machine with GPU(s), first make sure you have the `transformers`, `accelerate` and `torch` libraries installed.
54
 
55
  ```bash
56
  pip install transformers==4.29.2
 
58
  pip install torch==2.0.0
59
  ```
60
 
61
+ Then, run the following Python snippet:
62
+
63
  ```python
64
  from transformers import AutoTokenizer, AutoModelForCausalLM
65
 
 
67
  model = AutoModelForCausalLM.from_pretrained("CobraMamba/mamba-gpt-3b-v4", trust_remote_code=True, torch_dtype=torch.float16)
68
 
69
  # we use alpaca prompt
70
+ input_content = "Your text here"
71
+ input_ids = tokenizer.encode(input_content, return_tensors="pt")
72
  output = model.generate(input_ids, max_length=128, temperature=0.7)
73
  output_text = tokenizer.decode(output[0], skip_special_tokens=True)
74
  print(output_text)