mav23 commited on
Commit
1575e34
1 Parent(s): 417ff60

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +83 -0
  3. athene-70b.Q4_0.gguf +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ athene-70b.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ tags:
7
+ - RLHF
8
+ - Nexusflow
9
+ - Athene
10
+ - Chat Model
11
+ ---
12
+ # Llama3-Athene-70B
13
+
14
+ We introduce Llama3-Athene-70B, an open-weights LLM trained through RLHF based off Llama-3-70B-Instruct. Athene-70B achieves a high score on Arena-Hard-Auto, a proxy benchmark for Chatbot Arena.
15
+
16
+ - **Developed by:** The Nexusflow Team (Evan Frick\*, Peter Jin\*, Tianle Li\*, Karthik Ganesan, Jian Zhang, Jiantao Jiao and Banghua Zhu).
17
+ - **Model type:** Chat Model
18
+ - **Finetuned from model:** [Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct).
19
+ - **License**: [Nexusflow Research License](https://huggingface.co/Nexusflow/Athene-70B/blob/main/Nexusflow_Research_License.pdf)
20
+ - **Blog**: https://nexusflow.ai/blogs/athene
21
+
22
+ | Model | Arena-Hard |
23
+ |---------------------------------|------------|
24
+ | Claude-3.5-Sonnet (Proprietary) | 79.3% |
25
+ | GPT-4o (Proprietary) | 79.2% |
26
+ | **Athene-70B (Open)** | 77.8% |
27
+ | Gemini-Pro-1.5 (Proprietary) | 72.0% |
28
+ | Gemma-2-27B (Open) | 57.0% |
29
+ | Llama-3-70B (Open) | 46.6% |
30
+
31
+ ## Usage
32
+
33
+ Athene-70B uses the same chat template as Llama-3-70B-Instruct. Below is an example simple usage using the Transformers library.
34
+
35
+ ```Python
36
+ import transformers
37
+ import torch
38
+
39
+ model_id = "Nexusflow/Athene-70B"
40
+
41
+ pipeline = transformers.pipeline(
42
+ "text-generation",
43
+ model=model_id,
44
+ model_kwargs={"torch_dtype": torch.bfloat16},
45
+ device_map="auto",
46
+ )
47
+
48
+ messages = [
49
+ {"role": "system", "content": "You are an Athene Noctura, you can only speak with owl sounds. Whoooo whooo."},
50
+ {"role": "user", "content": "Whooo are you?"},
51
+ ]
52
+
53
+ terminators = [
54
+ pipeline.tokenizer.eos_token_id,
55
+ pipeline.tokenizer.convert_tokens_to_ids("<|end_of_text|>")
56
+ ]
57
+
58
+ outputs = pipeline(
59
+ messages,
60
+ max_new_tokens=256,
61
+ eos_token_id=terminators,
62
+ do_sample=True,
63
+ temperature=0.6,
64
+ top_p=0.9,
65
+ )
66
+ print(outputs[0]["generated_text"][-1])
67
+ ```
68
+
69
+ ## Acknowledgment
70
+
71
+ We would like to thank the [LMSYS Organization](https://lmsys.org/) for their support of testing the model. We would like to thank Meta AI and the open source community for their efforts in providing the datasets and base models.
72
+
73
+ ## Citation
74
+
75
+ ```
76
+ @misc{Athene2024,
77
+ title = {Athene-70B: Redefining the Boundaries of Post-Training for Open Models},
78
+ url = {https://nexusflow.ai/blogs/athene},
79
+ author = {Frick, Evan and Jin, Peter and Li, Tianle and Ganesan, Karthik and Zhang, Jian and Jiao, Jiantao and Zhu, Banghua},
80
+ month = {July},
81
+ year = {2024}
82
+ }
83
+ ```
athene-70b.Q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1856df36307f2b9e4391f3b30472947bf8f7208006a6b4c1b3fe6358ca9fadba
3
+ size 39969732480