Model Card for TrelisSmolLM-base
This model is a pruned and distilled version of SmolLM-360M, created for scientific curiosity.
To purchase the training scripts used for this model, visit: https://trelis.com/advanced-fine-tuning-scripts/
Model Details
Model Description
- Developed by: Trelis Team
- Model type: Language Model
- Language(s) (NLP): English
- License: [More Information Needed]
- Finetuned from model: HuggingFaceTB/SmolLM-360M
TrelisLM-80M is a 80 million parameter language model derived from SmolLM-360M. It was created through a process of layer and width pruning, followed by distillation from SmolLM-360M-Instruct using Forward KL loss.
Uses
Direct Use
This model is primarily intended for scientific curiosity and research purposes. It can be used to explore the effects of model pruning and distillation on language model performance.
Out-of-Scope Use
As this model is still not completely trained, it should not be used for any production or real-world applications at this stage.
Bias, Risks, and Limitations
The model is still in the training process and may have unpredictable behaviors or biases. It should be used with caution and only for research purposes.
Recommendations
Users should be aware that this model is a work in progress and its outputs should not be relied upon for any critical or sensitive tasks.
Training Details
Training Data
The model was distilled using the Trelis/smollm-corpus-2percent dataset.
Training Procedure
The training procedure involved the following steps:
- Layer pruning of SmolLM-360M
- Width pruning of SmolLM-360M
- Distillation from SmolLM-360M-Instruct using Forward KL loss
Evaluation
Evaluation results are not yet available for this model.
Model Examination
Further examination and interpretation of the model's behavior are needed.
Environmental Impact
[More Information Needed]
Technical Specifications
Model Architecture and Objective
TrelisLM-80M is an 80 million parameter language model derived from SmolLM-360M through pruning and distillation from SmolLM-360M-Instruct.
Compute Infrastructure
[More Information Needed]
Model Card Contact
[More Information Needed]
- Downloads last month
- 5
Model tree for Trelis/TrelisSmolLM-base
Base model
HuggingFaceTB/SmolLM-360M