Fix error in config.json
#9
by
pere
- opened
The decoder_start_token_id should refer to the <|startoftranscript|> token in the vocabulary.
Thanks for the fix, I agree that this needs to be corrected as it should match v2 in it's generation config: https://huggingface.co/openai/whisper-large-v2/blob/696465c62215e36a9ab3f9b7672fe7749f1a1df5/config.json#L19
patrickvonplaten
changed pull request status to
merged
Thanks a lot @pere
Good catch
@pere
! We converted the generation_config
standalone but missed the generation attributes in the config. The bos_token_id
and eos_token_id
also need updating: https://huggingface.co/openai/whisper-large-v3/discussions/25#6555f5d2ef6e96329fd5db2f