nanotron
/

mistral-nanotron

nouamanetazi HF staff commited on Jan 30

Commit

5d8e8eb

•

1 Parent(s): d673f11

Upload folder using huggingface_hub

Files changed (1) hide show

README.md CHANGED Viewed

@@ -12,8 +12,17 @@ Modeling code for Mistral to use with [Nanotron](https://github.com/huggingface/
 # Generate a config file
 python config_tiny_mistral.py
 # Run training
 export CUDA_DEVICE_MAX_CONNECTIONS=1 # important for some distributed operations
 torchrun --nproc_per_node=8 run_train.py --config-file config_tiny_mistral.yaml
 ```

 # Generate a config file
 python config_tiny_mistral.py
 # Run training
 export CUDA_DEVICE_MAX_CONNECTIONS=1 # important for some distributed operations
 torchrun --nproc_per_node=8 run_train.py --config-file config_tiny_mistral.yaml
 ```
+## 🚀 Use your custom model
+- Update the `MistralConfig` class in `config_tiny_mistral.py` to match your model's configuration
+- Update the `MistralForTraining` class in `modeling_mistral.py` to match your model's architecture
+- Pass the previous to the `DistributedTrainer` class in `run_train.py`:
+```python
+trainer = DistributedTrainer(config_file, model_class=MistralForTraining, model_config_class=MistralConfig)
+```
+- Run training as usual