Trelis/whisper-small-llm-lingo · Steps to the creation of this model

Jul 30

Hi! Thank you for your excellent youtube video on Whisper fine-tuning. In the video, after fine-tuning, you choose a checkpoint, merge, and upload. How did you gather all the files in this repo? I think tokenizer.json for instance is not created in the checkpoint, did you copy it from the original model and put it here?

Thanks!
Nuno

RonanMcGovern

Trelis org Jul 30

Howdy you can just save the tokenizer and then push that and it should push the tokenizer files

nfplay

Jul 30

•

edited Jul 30

Hi! I did just that experiment, to know what in fact was being saved and when:

Saving the processor:

Saving the tokenizer:

The tokenizer just overwrites the files of the processor, except preprocessor_config.

I had to manually copy the tokenizer.json of the original model for the model to be usable (i'm converting using ctranslate2 and using it with faster-whisper). I can see that you have the tokenizer.json in your repo so I was wondering what what was the workflow you used. At least in my case it is not saving the tokenizer.json file. This is a large-v3 finetuning, and it's a local copy only, not interested in pushing it to HF.

Thanks for the help,

Nuno

RonanMcGovern

Trelis org Jul 31

interesting, thanks for sharing that. Yeah it is possible I just copied pasted the json as you did.