Text Generation
Transformers
PyTorch
Safetensors
English
Chinese
llama
text-generation-inference
Inference Endpoints

dawg this is not a pretrained model.

#3
by Dampish - opened

how do you derive a model and call it pretrained

Basically, MiniMA-3B is distilled from LLaMA2-7B on subsampled Pile, GitHub, and WuDao data, totally 100+B tokens. In terms of data scale, MiniMA-3B is somehow a (continuously) pretrained model.

And you are perhaps referring MiniChat-3B (https://huggingface.co/GeneZC/MiniChat-3B) which is MiniMA-3B finetuned on instruction data and indeed a finetuned model.

GeneZC changed discussion status to closed

Sign up or log in to comment