Comparison with the distilled model

by eustlb HF staff - opened 23 days ago

23 days ago

Great work! 🤗
I’m curious about how the fine-tuned model compares to the distilled one. Would it be possible to add it to your evaluation results table?

eustlb changed discussion title from Comparison with to Comparison with the distilled model 23 days ago

flozi00

primeLine AI Services org 23 days ago

Yes, of course I can add the distilled version too.
Since the turbo version with the new recipe is even better than the original large model and the distilled version is not as good as the large one, i didn't compared it yet directly

eustlb

23 days ago

Yep, it definitively makes sense from an accuracy point of view, but since the distilled version is still faster (2 vs 4 layers decoder), one might be interested in the speed/ accuracy tradeoff you get with fine-tuned large-v3-turbo VS distilled large-v3. You could even add a generation speed metric (see this gist)

flozi00

primeLine AI Services org 23 days ago

Idea, before publishing I will use the new recipe to train another distil version with 2 decoder layers

Does it sound good for you ? :)

eustlb

23 days ago

What new recipe are you referring to here ? Also I would recommend to wait just a bit before distillling a model, we are soon merging Whisper fixes in Transformers (see this PR, also this one)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment