Commit
•
5d8e8eb
1
Parent(s):
d673f11
Upload folder using huggingface_hub
Browse files
README.md
CHANGED
@@ -12,8 +12,17 @@ Modeling code for Mistral to use with [Nanotron](https://github.com/huggingface/
|
|
12 |
# Generate a config file
|
13 |
python config_tiny_mistral.py
|
14 |
|
15 |
-
|
16 |
# Run training
|
17 |
export CUDA_DEVICE_MAX_CONNECTIONS=1 # important for some distributed operations
|
18 |
torchrun --nproc_per_node=8 run_train.py --config-file config_tiny_mistral.yaml
|
19 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
# Generate a config file
|
13 |
python config_tiny_mistral.py
|
14 |
|
|
|
15 |
# Run training
|
16 |
export CUDA_DEVICE_MAX_CONNECTIONS=1 # important for some distributed operations
|
17 |
torchrun --nproc_per_node=8 run_train.py --config-file config_tiny_mistral.yaml
|
18 |
```
|
19 |
+
|
20 |
+
## 🚀 Use your custom model
|
21 |
+
|
22 |
+
- Update the `MistralConfig` class in `config_tiny_mistral.py` to match your model's configuration
|
23 |
+
- Update the `MistralForTraining` class in `modeling_mistral.py` to match your model's architecture
|
24 |
+
- Pass the previous to the `DistributedTrainer` class in `run_train.py`:
|
25 |
+
```python
|
26 |
+
trainer = DistributedTrainer(config_file, model_class=MistralForTraining, model_config_class=MistralConfig)
|
27 |
+
```
|
28 |
+
- Run training as usual
|