File size: 4,865 Bytes
0fd282e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
## Checkpoints and conversion scripts for Nemo cpkt files to Huggingface

This repo contains two checkpoints (`.ckpt` files) for UL2 models we have started pretraining with Nemo. The checkpoints are found in `nemo_checkpoints/`. The Nemo config files used to train these models can be found in `nemo_config/ul2-base-nl36`.

`megatron_ul2--val_loss=2.54-step=7000-consumed_samples=14557920.0.ckpt` was trained with `megatron_legacy: False` in the config, whereas the other checkpoint was trained with `megatron_legacy: True`.

Nvidia have created a conversion script that converts T5, T5v1.1 and UL2 models on Huggingface Hub to Nemo format. The script can be found [here](https://github.com/NVIDIA/NeMo/blob/main/scripts/nlp_language_modeling/hf_t5-v1_1_to_nemo.py). It is also included in this repo.

We thought that adapting a T5/UL2 model trained with Nemo to a Huggingface format would simply be a manner of reversing the conversion that was performed by the script above. Our conversion script does work assuming we operate directly on the `pt` state dict weight files produced by running the above Nvidia script. I.e. it works when going directly `Huggingface -> Nemo -> Huggingface`. However, it does not work when attempting to go `Nemo -> Huggingface`. An UL2 model that was initialized with Nemo Megatron, and pretrained with Nemo, does not produce same output when converted to Huggingface format. 

### Dependencies

We use Nemo docker containers (tag `23.02`) via Singularity when running the code in this repo. We have included a definition file to build the container. 

To build the container:

```bash
sudo singularity build nemo2302.sif nemo_singularity.def
```

We provide bash scripts to execute with singularity. However, to debug easier you can also run singularity in interactive mode via:

```bash
singularity shell --nv nemo2302.sif
```

### Converting Nemo checkpoints to Huggingface

We have included our conversion script in this repo. It can be found in `convert_nemo_ul2_checkpoint.py`.

We manually created a Huggingface config file for UL2 that to the best of our knowledge matches the settings used when we trained with Nemo (see `config_ul2_base_nl36.json`). 

To replicate our weights conversion, simply run: 

```bash
singularity exec --nv nemo2302.sif bash convert_nemo_to_hf.sh
``` 

The resulting Huggingface model will be saved to `ul2-base-nl36-swedish/`.

We are aware that [Megatron-LM uses different ordering of QKV](https://github.com/NVIDIA/Megatron-LM/blob/42c1cf4279acea5a554500dcb552211f44cbec45/megatron/checkpointing.py#L209-L237) in the attention layers depending on the version of Megatron-LM used. We are also aware of an existing conversion script that Huggingface have created for converting Megatron-BERT to Huggingface, where they adapt the ordering of QKV in Megatron to [match the ordering used in Huggingface](https://github.com/NVIDIA/Megatron-LM/blob/42c1cf4279acea5a554500dcb552211f44cbec45/megatron/checkpointing.py#L209-L237). As such we have an optional `--fix_qkv` parameter in our conversion script that applies the same reordering of QKV as Huggingface does. See the lines that are commented out in `convert_nemo_to_hf.sh` for an example of how to use this parameter and set the `checkpoint_version`. 

Unfortunately, none of the above solves the issue we have with the conversion script.

We have a test script that predicts both with the original Nemo model and with the converted Huggingface model. The output unfortunately isn't the same. We used the same identical tokenizer for both models. To run:

```bash
singularity exec --nv nemo2302.sif python test_ul2_hf.py
```

Or explore in interactive mode with `singularity shell --nv nemo2302.sif`.

### Confirming the conversion script can reverse Nvidia's conversion script

In order to confirm the conversion script is valid enough in the sense that it is able to reverse Nvidia's conversion script, we here include instructions to convert a UL2 model from Huggingface to Nemo, via Nvidia's conversion script, and then back to Huggingface via our conversion script.

Instructions:

1. Run `singularity exec --nv nemo2302.sif bash convert_hf_to_nemo.sh` to convert the existing [Finnish-NLP/ul2-base-nl36-finnish](https://huggingface.co/Finnish-NLP/ul2-base-nl36-finnish) from Huggingface to Nemo format via Nvidia's conversion script. The resultning model weights will be saved to the folder `ul2-base-nl36-finnish/`.
2. To perform the reverse conversion, and to perform a check whether the re-converted weights are identical, run `python convert_finnish_ul2_model.py`. Or via singularity: `singularity exec --nv nemo2302.sif python convert_finnish_ul2_model.py`. 

The resuling model re-converted to Huggingface will be found in `ul2-base-nl36-finnish/hf_t5_ul2`. 

This conversion produces a model that is identical to the original model.