Mixtral 7b 8 Expert
This is a preliminary HuggingFace implementation of the newly released MoE model by MistralAi. Make sure to load with trust_remote_code=True
.
Thanks to @dzhulgakov for his early implementation (https://github.com/dzhulgakov/llama-mistral) that helped me find a working setup.
Also many thanks to our friends at LAION and HessianAI for the compute used for these projects!
Benchmark scores:
hella swag: 0.8661
winogrande: 0.824
truthfulqa_mc2: 0.4855
arc_challenge: 0.6638
gsm8k: 0.5709
MMLU: 0.7173
Basic Inference setup
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("DiscoResearch/mixtral-7b-8expert", low_cpu_mem_usage=True, device_map="auto", trust_remote_code=True)
tok = AutoTokenizer.from_pretrained("DiscoResearch/mixtral-7b-8expert")
x = tok.encode("The mistral wind in is a phenomenon ", return_tensors="pt").cuda()
x = model.generate(x, max_new_tokens=128).cpu()
print(tok.batch_decode(x))
Conversion
Use convert_mistral_moe_weights_to_hf.py --input_dir ./input_dir --model_size 7B --output_dir ./output
to convert the original consolidated weights to this HF setup.
Come chat about this in our Disco(rd)! :)
- Downloads last month
- 20,017
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.