Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

.gitattributes +1 -0
README.md +83 -0
athene-70b.Q4_0.gguf +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+athene-70b.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,83 @@

+---
+license: other
+language:
+- en
+library_name: transformers
+tags:
+- RLHF
+- Nexusflow
+- Athene
+- Chat Model
+---
+# Llama3-Athene-70B
+We introduce Llama3-Athene-70B, an open-weights LLM trained through RLHF based off Llama-3-70B-Instruct. Athene-70B achieves a high score on Arena-Hard-Auto, a proxy benchmark for Chatbot Arena.
+- **Developed by:** The Nexusflow Team (Evan Frick\*, Peter Jin\*, Tianle Li\*, Karthik Ganesan, Jian Zhang, Jiantao Jiao and Banghua Zhu).
+- **Model type:** Chat Model
+- **Finetuned from model:** [Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct).
+- **License**: [Nexusflow Research License](https://huggingface.co/Nexusflow/Athene-70B/blob/main/Nexusflow_Research_License.pdf)
+- **Blog**: https://nexusflow.ai/blogs/athene
+| Model                           | Arena-Hard |
+|---------------------------------|------------|
+| Claude-3.5-Sonnet (Proprietary) |      79.3% |
+| GPT-4o (Proprietary)            |      79.2% |
+| **Athene-70B (Open)**           |      77.8% |
+| Gemini-Pro-1.5 (Proprietary)    |      72.0% |
+| Gemma-2-27B (Open)              |      57.0% |
+| Llama-3-70B (Open)              |      46.6% |
+## Usage
+Athene-70B uses the same chat template as Llama-3-70B-Instruct. Below is an example simple usage using the Transformers library.
+```Python
+import transformers
+import torch
+model_id = "Nexusflow/Athene-70B"
+pipeline = transformers.pipeline(
+    "text-generation",
+    model=model_id,
+    model_kwargs={"torch_dtype": torch.bfloat16},
+    device_map="auto",
+)
+messages = [
+    {"role": "system", "content": "You are an Athene Noctura, you can only speak with owl sounds. Whoooo whooo."},
+    {"role": "user", "content": "Whooo are you?"},
+]
+terminators = [
+    pipeline.tokenizer.eos_token_id,
+    pipeline.tokenizer.convert_tokens_to_ids("<|end_of_text|>")
+]
+outputs = pipeline(
+    messages,
+    max_new_tokens=256,
+    eos_token_id=terminators,
+    do_sample=True,
+    temperature=0.6,
+    top_p=0.9,
+)
+print(outputs[0]["generated_text"][-1])
+```
+## Acknowledgment
+We would like to thank the [LMSYS Organization](https://lmsys.org/) for their support of testing the model. We would like to thank Meta AI and the open source community for their efforts in providing the datasets and base models.
+## Citation
+```
+@misc{Athene2024,
+    title = {Athene-70B: Redefining the Boundaries of Post-Training for Open Models},
+    url = {https://nexusflow.ai/blogs/athene},
+    author = {Frick, Evan and Jin, Peter and Li, Tianle and Ganesan, Karthik and Zhang, Jian and Jiao, Jiantao and Zhu, Banghua},
+    month = {July},
+    year = {2024}
+}
+```

athene-70b.Q4_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1856df36307f2b9e4391f3b30472947bf8f7208006a6b4c1b3fe6358ca9fadba
+size 39969732480