Fix typo
Browse files
README.md
CHANGED
@@ -24,7 +24,7 @@ This model is released under the [NVIDIA Open Model License Agreement](https://d
|
|
24 |
|
25 |
## Model Architecture
|
26 |
|
27 |
-
Llama-3.1-Minitron-4B-Width-Base uses a model embedding size of
|
28 |
|
29 |
**Architecture Type:** Transformer Decoder (Auto-Regressive Language Model)
|
30 |
|
|
|
24 |
|
25 |
## Model Architecture
|
26 |
|
27 |
+
Llama-3.1-Minitron-4B-Width-Base uses a model embedding size of 3072, 32 attention heads, MLP intermediate dimension of 9216, with 32 layers in total. Additionally, it uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE).
|
28 |
|
29 |
**Architecture Type:** Transformer Decoder (Auto-Regressive Language Model)
|
30 |
|