nvidia
/

Mistral-NeMo-Minitron-8B-Base

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

srvm commited on 29 days ago

Commit

cc94637

•

1 Parent(s): 4ea0f55

Update link to tech report

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ library_name: transformers
 ## Model Overview
-Mistral-NeMo-Minitron-8B-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks. It is a large language model (LLM) obtained by pruning and distilling the Mistral-NeMo 12B; specifically, we prune the embedding dimension and MLP intermediate dimension in the  model. Following pruning, we perform continued training with distillation using 380 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose.
 **Model Developer:** NVIDIA
@@ -140,5 +140,6 @@ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.
 ## References
 * [Minitron: Compact Language Models via Pruning and Knowledge Distillation](https://arxiv.org/abs/2407.14679)
-* [LLM Pruning and Distillation in Practice: The Minitron Approach](https://research.nvidia.com/publication/_llm-pruning-and-distillation-practice-minitron-approach)

 ## Model Overview
+Mistral-NeMo-Minitron-8B-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks. It is a large language model (LLM) obtained by pruning and distilling the Mistral-NeMo 12B; specifically, we prune the embedding dimension and MLP intermediate dimension in the  model. Following pruning, we perform continued training with distillation using 380 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose. Please refer to our [technical report](https://arxiv.org/abs/2408.11796) for more details.
 **Model Developer:** NVIDIA
 ## References
 * [Minitron: Compact Language Models via Pruning and Knowledge Distillation](https://arxiv.org/abs/2407.14679)
+* [LLM Pruning and Distillation in Practice: The Minitron Approach](https://arxiv.org/abs/2408.11796)