Edit model card

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

LlaMoE-Medium model image

This is a 4x8b Llama Mixture of Experts (MoE) model. It was trained on OpenHermes Resort from the Dolphin-2.9 dataset.

The model is a combination of 4 Llama fine-tunes, using DeepSpeed-MoE's architecture. All experts are active for every token.

This is a VERY good model, somewhere in between 8B and Llama 70B in capability. Enjoy!

Thank you to:

CrusoeEnergy for sponsoring the compute for this project
My collaborators Eric Hartford and Fernando (has too many names) Neto

Safetensors

Model size

30.6B params

Tensor type

BF16

Inference Examples

Inference API (serverless) does not yet support model repos that contain custom code.