βοΈ Nano-Mistral
Modeling code for Mistral to use with Nanotron
Also contains converted pretrained weights for Mistral-7B-0.1: https://huggingface.co/mistralai/Mistral-7B-v0.1
π Quickstart
# Generate a config file
python config_tiny_mistral.py
# Run training
export CUDA_DEVICE_MAX_CONNECTIONS=1 # important for some distributed operations
torchrun --nproc_per_node=8 run_train.py --config-file config_tiny_mistral.yaml
π Run generation with pretrained Mistral-7B-0.1
export CUDA_DEVICE_MAX_CONNECTIONS=1
torchrun --nproc_per_node=1 run_generate.py --ckpt-path ./pretrained/Mistral-7B-v0.1
π Use your custom model
- Update the
MistralConfig
class inconfig_tiny_mistral.py
to match your model's configuration - Update the
MistralForTraining
class inmodeling_mistral.py
to match your model's architecture - Pass the previous to the
DistributedTrainer
class inrun_train.py
:
trainer = DistributedTrainer(config_file, model_class=MistralForTraining, model_config_class=MistralConfig)
- Run training as usual