someone13574
/

mixtral-8x7b-32kseqlen

Model card Files Files and versions Community

How to Run

by mrfakename - opened Dec 8, 2023

Discussion

mrfakename

Dec 8, 2023

Hi,
Do you know if there are any inference scripts for this?

spawn99

Dec 8, 2023

good luck !

someone13574

Owner Dec 8, 2023

Hi,
Do you know if there are any inference scripts for this?

You might be able to get it running using this and tools/run_text_generation_server.py, which is linked to by this repo from mistral's github, but whether or not it really works is unknown. Proper ways to run it will pop up eventually.

someone13574

Owner Dec 8, 2023

https://github.com/dzhulgakov/llama-mistral

mrfakename

Dec 8, 2023

Thanks! So it’s not instruct tuned?

someone13574

Owner Dec 8, 2023

Thanks! So it’s not instruct tuned?

No, it's a base model.

mrfakename

Dec 8, 2023

Sad. I heard somewhere that MoE models are hard to finetune, is that true?

apicodex

Dec 9, 2023

They released a fine tuned model last time, I'm sure they'll drop a instruct model soon, it's a hype drop a battle of two different generations of new young hungry team up to date and in touch with the younger generation versus the biggest and oldest in the industry, as far as it being hard to fine tune I think it just depends on your area of focus and who your asking.

zsytony

Dec 10, 2023

Inference code: https://github.com/open-compass/MixtralKit
Evaluation results will be updated soon

Cxxs

Dec 11, 2023

LLaMA2-Accessory now supports the inference and instruction finetuning (both full-parameter and PEFT like LoRA) of mixtral-8x7b-32kseqlen. It supports the load balancing loss and will add more MoE support soon. The document is here

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment