Steps to the creation of this model

#1
by nfplay - opened

Hi! Thank you for your excellent youtube video on Whisper fine-tuning. In the video, after fine-tuning, you choose a checkpoint, merge, and upload. How did you gather all the files in this repo? I think tokenizer.json for instance is not created in the checkpoint, did you copy it from the original model and put it here?

Thanks!
Nuno

Trelis org

Howdy you can just save the tokenizer and then push that and it should push the tokenizer files

Hi! I did just that experiment, to know what in fact was being saved and when:

Saving the processor:

image.png

Saving the tokenizer:

image.png

The tokenizer just overwrites the files of the processor, except preprocessor_config.

I had to manually copy the tokenizer.json of the original model for the model to be usable (i'm converting using ctranslate2 and using it with faster-whisper). I can see that you have the tokenizer.json in your repo so I was wondering what what was the workflow you used. At least in my case it is not saving the tokenizer.json file. This is a large-v3 finetuning, and it's a local copy only, not interested in pushing it to HF.

Thanks for the help,

Nuno

Trelis org

interesting, thanks for sharing that. Yeah it is possible I just copied pasted the json as you did.

Sign up or log in to comment