---
library_name: transformers
license: llama3
datasets:
- VTSNLP/vietnamese_curated_dataset
language:
- vi
- en
base_model:
- meta-llama/Meta-Llama-3-8B
pipeline_tag: text-generation
---

# Model Information

<!-- Provide a quick summary of what the model is/does. -->


## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

Llama3-ViettelSolutions-8B is a variant of the Meta Llama-3-8B model, continued pre-trained on the [Vietnamese curated dataset](https://huggingface.co/datasets/VTSNLP/vietnamese_curated_dataset) and supervised fine-tuned on 5 million samples of Vietnamese instruct data.
- **Developed by:** Viettel Solutions 
- **Funded by:** NVIDIA
- **Model type:** Autoregressive transformer model
- **Language(s) (NLP):** Vietnamese, English
- **License:** Llama 3 Community License
- **Finetuned from model:** meta-llama/Meta-Llama-3-8B

## Uses

Example snippet for usage with Transformers:

```
import transformers
import torch

model_id = "VTSNLP/Llama3-ViettelSolutions-8B"

pipeline = transformers.pipeline(
    "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
)
pipeline("Xin chào!")
```


## Training Details

### Training Data

- Dataset for continue pretrain: [Vietnamese curated dataset](https://huggingface.co/datasets/VTSNLP/vietnamese_curated_dataset)

- Dataset for supervised fine-tuning: [Instruct general dataset](https://huggingface.co/datasets/VTSNLP/instruct_general_dataset)


### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

#### Preprocessing

[More Information Needed]


#### Training Hyperparameters

- **Training regime:** bf16 mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
- **Data sequence length:** 8192
- **Tensor model parallel size:** 4
- **Pipelinemodel parallel size:** 1
- **Context parallel size:** 1
- **Micro batch size:** 1
- **Global batch size:** 512

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Dataset Card if possible. -->

[More Information Needed]

#### Factors

<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

[More Information Needed]

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

[More Information Needed]

### Results

[More Information Needed]

#### Summary

[More Information Needed]

## Technical Specifications

- Compute Infrastructure: NVIDIA DGX 

- Hardware: 4 x A100 80GB

- Software: [NeMo Framework](https://github.com/NVIDIA/NeMo)

## Citation 

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

[More Information Needed]

**APA:**

[More Information Needed]

## More Information

[More Information Needed]

## Model Card Authors

[More Information Needed]

## Model Card Contact

[More Information Needed]